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The  Residue  Number  System  (RNS)  has  received  attention  over  the 
last  two  decades  because  the  advances  in  memory  technology  combined 
with  the  system's  ability  for  high-speed  arithmetic  provide 
attractive  solutions  in  many  real-time  Digital  Signal  Processing 
(DSP)  applications.  More  recently  the  Quadratic  Residue  Number 
System  (QRNS)  was  introduced  which  offers  significant  hardware 
savings  and  throughput  enhancements  for  the  case  of  complex 
multiplicative  intensive  environments.  This  dissertation  can  be 
considered  as  an  effort  to  extend  the  concept  of  the  QRNS  and 
succeeds  in  the  development  of  the  Polynomial  Residue  Number  System 
(PRNS).  The  PRNS  is  derived  in  terms  of  polynomial  rings  and  the 
Chinese  Remainder  Theorem  (CRT)  and  is  meeting  Winograd's  lower 
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bound  for  multiplication  count.  The  system  is  theoretically 
developed,  conditions  for  its  existence  are  studied,  and 
applications  of  it  are  discussed  in  the  areas  of  complex  and  real 
arithmetic  as  well  as  multidimensional  processing.  It  can  be 
claimed  that  the  results  of  this  research  make  feasible  the  solution 
of  many  ultra-high-speed  DSP  problems  which  were  previously 
considered  to  be  unsolved. 
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CHAPTER  1 
INTRODUCTION 


In  the  last  two  decades,  the  area  of  Digital  Signal  Processing 
(DSP)  has  experienced  a period  of  rapid  growth  with  substantial 
successful  innovations.  While  DSP  researchers  have  responded  to 
requirements  for  high  speed  and  low  cost,  there  are  still  a number 
of  problems  that  necessitate  ultra-high-speed  processing  such  that 
the  current  generation  of  computers  can  only  handle  "off-line." 
Such  problems  are  usually  Multidimensional  Signal  Processing  or  some 
real-time  problems.  Faster  systems  must  be  designed  in  order  to 
solve  the  problems  of  real-time  processing 

The  design  of  ultra-high-speed,  low-cost  digital  systems  is 
approached  in  this  dissertation  utilizing  recent  advancements  made 
in  a 2000-year-old  branch  of  mathematics  known  as  "Residue 
Arithmetic. " 

The  Residue  Number  System  (RNS)  has  received  increased  attention 
because  advances  in  semiconductor  technology  combined  with  the 
system's  potential  for  concurrent  and  carry-free  high-speed 
arithmetic  are  very  attractive  for  computation  intensive  problems. 
The  RNS  is  an  integer  number  system  defined  over  a set  of  relatively 
prime  integers  called  moduli.  Unlike  weighted  number  systems  such 
as  the  binary  and  decimal,  the  digits  of  a number  in  the  RNS  are  all 
equally  significant.  This  implies  that  residue  arithmetic  can  be 
performed  in  parallel  without  any  carry  propagation.  Every  number 
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in  the  RNS  is  mapped  into  an  L-tuple  and  operations  are  of  the  form 
c = a • b modm,  where  denotes  addition,  subtraction,  or 
multiplication.  By  making  use  of  high-speed  semiconductor  memories 
for  table  look-up  arithmetic,  higher  computation  speeds  can  be 
obtained  using  the  RNS  rather  than  binary  arithmetic.  Due  to  the 
fact  that  division  and  magnitude  comparison  are  difficult  operations 
in  the  RNS,  it  is  clear  that  such  a system  cannot  be  used  for 
general  purpose  computing  but  only  for  special  purpose  designs  that 
require  no  divisions  and  no  conditional  branching.  One  such  case 
where  RNS  can  be  used  with  success  is  in  Digital  Signal  Processing. 

Although  the  RNS  has  offered  a significant  contribution  in  the 
design  of  ultra-high-speed  signal  processing  systems,  there  are 
still  some  problems  that  may  not  be  solvable  in  real-time  even  with 
the  use  of  the  parallel  RNS.  Such  problems  require  a big  number  of 
multidimensinal  operations.  A prime  example  is  the  multiplication 
of  two  complex  numbers  which  calls  for  six  real  operations,  (four 
real  multiplies  and  two  real  adds).  The  computation  of  an  FFT  is  a 
representative  example  of  a complex  multiplicative  intensive 
environment.  Another  example  of  a highly  complex  multidimensional- 
operation  environment  is  the  case  of  a multi-channel  communication 
correlator  receiver.  A multi-channel  signal  entering  the  receiver 
has  to  be  correlated  with  another  multi-channel  signal  and  the 
resulting  process  can  be  extremely  complex,  unable  to  be  performed 
in  real  time. 

Another  important  problem  in  RNS  arithmetic  is  that  of  the 
address  space  limitation  of  high-speed  semiconductor  memories.  The 
channel-width  per  RNS  channel  is  proportional  to  the  dynamic  range 
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that  our  RNS  system  will  be  able  to  cover.  In  other  words,  if  a big 
dynamic  range  is  desired,  then  each  RNS  channel  must  be  sufficiently 
wide,  so  that  the  entire  dynamic  range  can  be  covered.  Taking  into 
account  that  increase  in  the  address  space  of  a memory  table  results 
in  slowing  down  its  speed,  it  is  obvious  that  increase  in  dynamic 
range  requirements  can  result  in  the  use  of  large  memory  tables  with 
such  slow  response  times  that  all  the  useful  advantages  obtained  by 
the  parallel  nature  of  the  RNS  can  be  lost.  One  solution  to  this 
problem  is  to  decompose  each  RNS  channel  into  subchannels,  each 
having  a smaller  channel  width,  thus  requiring  smaller  address  space 
memory  tables. 

This  dissertation  attempts  to  provide  solutions  to  the  two  main 
problems  discussed  above.  The  primary  tool  used  is  a newly 
developed  processing  system,  the  so-called  Polynomial  Residue  Number 


System  of 

order 

N 

(PRNS) . 

Applications 

of 

this 

system  are 

discussed . 

These 

applications 

are  classified 

into  two 

categories : 

The  first 

category 

includes 

applications 

that 

involve  an  N- 

dimensional  problem  to  be  processed  with  the  PRNS  of  order  N but 
with  less  complexity  than  previously  achieved.  The  second  category 
includes  applications  that  are  of  dimension  one  or  two  which  are 
first  encoded  as  N-dimensional  problems,  (in  order  to  result  in  a 
reduced  channel  width),  and  then  are  being  processed  with  the 
minimal  possible  computational  complexity,  using  the  PRNS  of  order 
N. 

The  work  is  principally  theoretical  and  strictly  mathematically 
oriented.  Applications  are  discussed  but  no  complete  design 
implementations  are  given.  Future  research  may  result  in 
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applications  for  implementation  using  the  PRNS  and  the  development 
of  ultra-high-speed,  low-cost,  low-complexity  designs.  The  present 
work  will  provide  the  necessary  theoretical  tools  for  the  researcher 
of  the  future. 

The  dissertation  is  organized  as  follows:  Chapter  2 contains  a 
comprehensive  survey  of  the  existing  literature  on  the  subject  of 
the  Residue  Number  System  arithmetic.  It  mainly  provides  a 
literatary  survey  on  some  difficult  RNS  operations  such  as  sign 
detection,  division,  overflow  detection,  and  scaling,  as  well  as 
surveying  RNS  applications.  Chapter  3 presents  the  Quadratic 
Residue  Number  System  (QRNS)  which  is  actually  a PRNS  of  order  two. 
The  QRNS  was  known  in  the  literature  prior  to  this  work  and  has  been 
the  chief  motivation  for  the  development  of  the  PRNS.  The  QRNS  is 
presented  theoretically  and  its  use  in  complex  arithmetic  is 
discussed.  The  results  of  applying  the  QNRS  in  a radix-4  FFT  design 
are  also  summarized.  Chapter  4 deals  with  the  theory  of  the  PRNS 
and  Chapter  5 discusses  cases  where  a real  or  complex  multiply  can 
be  encoded  as  N-dimensional  problems  resulting  in  reduced  channel 
width,  aimed  at  hardware  savings  and  speed  increase.  Chapter  6 
mainly  deals  with  N-dimensional  problems  and  how  they  can  be 
processed  with  minimal  complexity  using  the  PRNS  of  order  N.  In 
Chapter  7,  the  research  is  summarized,  its  significance  is  brought 
out  and  directions  for  future  research  orientation  are  briefly 
presented . 


CHAPTER  2 

REVIEW  OF  THE  RESIDUE  NUMBER  SYSTEM  (RNS) 
2.1  Introduction  to  the  RNS 


2.1.1  Historical  Perspective 

What  is  the  number  that  divided  by  3 leaves  a remainder  of  2, 
divided  by  5 a remainder  of  3,  and  divided  by  7 a remainder  of  2? 
This  is  the  question  that  Sun  Tzu  asked  around  100  A.D.  thus 
establishing  the  so-called  Chinese  Remainder  Theorem  and  the  area  of 
residue  number  arithmetic  [Sod86b,  Tay84a].  Coincidentally,  the  Greek 
mathematician  Nichomachus  achieved  similar  conclusions  working 
independently  from  Sun  Tzu  around  the  same  period  of  100  A.D.  [Gro66a, 
Knu69a] . 

From  the  years  of  Sun  Tzu  and  Nichomachus  up  until  the  eighteenth 
and  nineteenth  centuries  there  is  a big  gap  in  Residue  Number 
Arithmetic  research,  until  the  great  mathematicians  Euler,  Fermat,  and 
Gauss  developed  the  theoretical  basis  for  the  modern  theory  of  the 
RNS. 

It  was  not  before  the  middle  of  the  late  1950s  that  the  residue 
arithmetic  found  applications  in  computers.  The  independence  of  the 
RNS  channels  provides  advantages  in  fault-tolerant  computing, 
something  that  was  observed  by  the  Czechoslovakians  Svoboda  and 
Valach,  who  performed  experiments  on  an  RNS  computer  studying  error 
codes  [Svo55a,  Svo57a,  Svo59a],  as  well  as  by  Garner  [Gar59a]. 
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A principal  contribution  in  the  theory  of  RNS  arithmetic  was 
offered  by  Szabo  and  Tanaka  at  Lockheed  in  the  1960s  [Tan62a].  In 
1967  they  wrote  a text  outlining  the  use  of  the  RNS  in  digital 
computers  [Sza67a]. 

To  date,  work  in  RNS  has  demonstrated  that  this  processing  system 
is  excellent  in  performing  high-speed  additions,  subtractions,  and 
multiplications,  but  it  definitely  shows  difficulties  in  other 
operations  such  as  division,  magnitude  comparison,  and  sign  detection. 
As  a result  the  RNS  is  not  very  suitable  for  designing  general  purpose 
computers,  but  extremely  attractive  for  special  purpose  applications 
which  involve  a great  number  of  additions,  subtractions,  and 
multiplications.  One  such  special  purpose  application  is  digital 
signal  processing,  which  was  already  developed  by  the  middle  1960s. 
The  work  of  Cheney  [Che61a],  who  designed  an  RNS  digital  correlator, 
demonstrates  the  use  of  the  Residue  Number  System  in  digital  signal 
processing. 

It  was  in  the  1970s,  however,  that  researchers  started  giving 
extensive  attention  to  the  RNS.  That  is  when  the  technology 
revolution  occurred  which  resulted  in  the  development  of  high-speed 
low-cost  RAMs  and  ROMs,  providing  the  RNS  the  ability  to  perform  its 
operations  in  a table  look-up  environment.  Since  that  time  hundreds 
of  studies  have  been  published  which  refer  to  the  RNS  and  its 
application  in  digital  signal  processing. 
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2.1.2  Mathematical  Preliminaries 

The  main  mathematical  preliminaries  of  the  Residue  Number  System 
are  found  in  Algebra  and  Number  Theory  [Lip81a],  [McC79a],  [Sod86b]. 
Some  basic  knowledge  which  might  be  useful  to  the  reader  of  this 
document  is  provided  here.  It  is  necessary  to  clarify  the  notation 
"xmodm":  xmodm  denotes  the  remainder  of  the  division  of  x by  m,  where 
both  x and  m are  integer  numbers. 

For  a given  integer  number  m,  the  set  Sm  = {0,  1,  . ..,  m - 1}  with 
modulo  m addition  and  multiplication  forms  a finite  field  if  m = p is 
a prime  number  [Tay84a].  Such  a finite  field  is  denoted  by  {Sp,  + , •} 
and  is  called  a Galois  field  GF(p).  In  such  a field  the 
multiplicative  inverse  x--*-  of  an  element  xsSp  always  exists.  This  x~* 
is  defined  by  the  equation  (x  • x~l)  modm  = 1. 

For  m non-prime,  the  structure  {Sm,  +,  •}  is  a finite  ring,  R(m), 
with  a weaker  algebraic  structure  than  that  of  a finite  field,  in  the 
sense  that  not  all  the  elements  in  the  ring  have  a multiplicative 
inverse . 

Another  difference  between  a finite  field  and  a finite  ring  is 
that  every  finite  field  Sp  is  generated  by  at  least  one  generator 
geSp,  where  g is  such  a number  that  its  first  p - 1 powers  together 
with  0 to  form  the  entire  field,  while  in  the  case  of  a finite  ring 
there  does  not  exist  any  generator  that  produces  the  entire  ring. 

Special  moduli  choices  of  great  interest  are  m = 2n  - 1,  m = 2n, 
and  m = 2n  + 1.  These  moduli  choices  are  important  because  modulo 
addition  and  multiplication  are  easily  mechanized. 

In  the  case  of  an  L moduli  system,  {m^,  m2,  ...,  mjJ  , where  any 

L 

two  moduli  nu,  m^  are  relatively  prime  and  M = II  m^ , the  ring  R(M) 
J i=l 
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is  isomorphic  to  the  direct  sum  of  the  L subrings  R(m^),  i = 1,  2, 
. ..,  L and  the  computations  can  be  performed  in  the  decomposed  system. 

2.1.3  Mathematical  Primitives 

The  Residue  Number  System  is  an  integer  system  defined  in  terms  of 
a set  of  relatively  prime  moduli  [Tay84a],  [Ram83b].  Suppose  that  the 
moduli  set  is  denoted  by  P,  where 


P = {m^,  m2,  ...,  mL} , GCD(mi,  m j ) = 1,  for  i t j (2.1) 

then  any  integer  XeZ^,  where 


L 

Zm  = {0,  1,  ...,  M-l} , M = n mi 

i = l 


has  a unique  RNS  representation  given  by 


(2.2) 


RNS 

X > (Xlt  X2,  . . . , XL)  (2.3) 

where  Xj  = Xmodm^,  i = 1,  2,  ...,  L. 

For  signed  numbers,  and  Xs[-M/2,  M/2],  X is  represented  as  in 
(2.3)  with  X^  = Xmodmj  if  X > 0 and  Xj  = (M  - |X|)  mod  m^  otherwise. 

The  main  potential  of  the  RNS  for  performing  ultra  high  speed 
arithmetic,  comes  from  the  fact  that  this  system  is  a carry-free, 
totally  parallel  processing  system.  In  other  words,  if  X,  Y,  ZsZM  and 
<f>  denotes  the  operations  add,  subtract,  and  multiply  (i.e.,  +,  -,  •)> 
then  Z = X<J>Y  satisfies 
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RNS 

Z > (Zlf  Z2 , • ••»  ZL);  Zi  = (Xj<j>Y^)  modm^ , i = 1,  2,  ...,  L (2.4) 

where  X --->  (Xlf  X2,  . ..,  XL)  and  Y — ->  (Yx,  Y2,  . ..,  YL) , are  the 
RNS  representations  of  X and  Y.  Equation  (2.4)  implies  that  the  RNS 
performs  its  operations  in  L totally  parallel  channels,  thus  being 
suitable  for  ultra-high-speed  processing.  Another  attractive  point  is 
that  due  to  today's  high  speed  semiconductor  memory,  each  channel 
operation  (Xj^Yj)  modm^  can  be  realized  as  a table  look-up  statement 
with  the  same  complexity  and  time  delay  involved  in  all  RNS 
operations,  thus  making  the  RNS  extremely  attractive  for  VLSI 
implementations  [Tay84a]. 

It  is  the  integer  and  unweighted  nature  of  the  RNS,  however,  that 
make  the  operations  of  division,  sign  detection,  and  magnitude 
comparison  difficult  [Ram83bJ. 

Overflow  prevention  and  scaling  due  to  overflow  are  not  trivial 
operations  in  the  RNS.  In  order  to  protect  the  RNS  from  the  risk  of 
overflow,  scaling  of  the  magnitude  of  the  variables  must  take  place  to 
keep  them  within  the  given  dynamic  range,  which  is  [0,  M - 1]  for 
unsigned  numbers  and  [ -M/2 , M/2  - 1]  for  signed  [Sod86b]. 

To  convert  an  RNS  representation  to  an  integer,  there  are 
primarily  two  ways,  namely:  The  Chinese  Remainder  Theorem  and  the 

Mixed  Radix  Conversion  [Tay84a].  They  are  stated  here  without  proof, 
and  are  based  on  moduli  choice  {m^,  m2,  ...,  mjJ  • 

Chinese  Remainder  Theorem  (CRT) 

If  X — ->  (Xj_,  X2,  ...,  XL) , M = II  m^,  M^  = M/mj.  and  N^  = Mj~l 

i=l 

modm^,  then 
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X 


I Xi  Mi  Nij  modM 


(2.5) 


Mixed  Radix  Conversion  (MRC) 


X = X'x  + X'2  + m2  X'3  + ...  + m-^,  m2,  ...»  mL_i  X'l  (2.6) 


One  very  important  case  of  an  RNS  is  based  on  the  moduli  choice  P = 
{2n  - 1,  2n,  2n  + 1),  established  by  Taylor  and  called  the  3-moduli 
RNS  [Tay81a].  Such  a 3-moduli  RNS  offers  large  dynamic  range,  ease  of 
performing  arithmetic  and  efficiency  in  residue  to  decimal  conversion 
as  well  as  scaling. 


2 . 2 Other  Operations 


In  this  section  subjects  related  to  residue  arithmetic,  including 
error  correction  and  detection,  are  described. 

2.2.1  Sign  Detection 

Banerji  [Ban74a]  studied  some  algorithms  for  sign  detection  and 
performed  a comparison  between  speed  of  residue  and  binary  arithmetic. 
After  this  comparison  was  performed,  it  was  shown  that  significant 
speed  gains  can  be  obtained  from  residue  arithmetic  only  in  cases 
where  the  number  of  algebraic  sign  detections  is  small  compared  to  the 
number  of  arithmetic  operations.  Conditions  under  which  one 
arithmetic  system  provides  higher  speed  compared  to  the  other  are  also 
presented  by  Banerji  in  terms  of  Winograd's  lower  bounds  on  addition 
and  multiplication  times.  Later,  starting  from  the  canonical  sum-of- 
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products  expression  for  the  sign  function,  Banerji  [Ban75a] 
transformed  the  expression  to  a form  whose  realization  is  simpler  than 
the  canonical  form  realization,  thus  simplifying  the  sign  detection 
problem. 

Ulman  [Ulm83a]  discussed  a new  method  of  sign  detection.  His 
method  has  the  advantage  of  providing  the  possibility  for  simultaneous 
execution  of  the  following  two  operations:  residue  to  mixed-radix 
conversion  of  the  number  magnitude  and  sign  detection  in  one  and  the 
same  circuit.  Vu  [Vu85a]  presented  an  efficient  implementation  of  the 
Chinese  Remainder  Theorem.  This  new  implementation  was  fast  and 
simple  because  it  reduced  the  sum  modulo  M in  the  conversion  formula 
to  a sum  modulo  2 through  the  use  of  fractional  representation.  This 
technique  accelerated  sign  detection  and  other  operations  based  on 
magnitude  comparison. 

2.2.2  Division 

A method  of  dividing  based  on  determining  the  reciprocal  of  the 
divisor  using  an  iteration  technique  has  been  studied  by  Vyshynsky  and 
Petushchak  [Vys73a].  Banerji  et  al.  [Ban81a]  proposed  a method  for 
choosing  an  approximate  divisor,  approximate  dividends,  and  partial 
quotients,  thus  providing  a high  speed  division  technique.  During  the 
same  time,  Gregory  [Gre81a]  presented  a method  to  perform  residue 
arithmetic  with  rational  operands.  The  rational  number  a/b  was  at 
first  mapped  on  the  integer  a • b--*-  modp  then  the  arithmetic  was 
performed  in  GF(p)  and  finally  the  result  was  mapped  to  its  rational 
equivalent  using  his  method. 
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2.2.3  Overflow  Detection 

In  a weighted  number  system  the  task,  of  overflow  detection  is 
almost  trivial,  since  this  information  is  provided  by  the  carry  out  of 
the  most  significant  digit  location.  Regrettably,  the  same  task  is 
not  that  easy  in  residue  arithmetic,  due  to  its  modular  nature.  Arora 
and  Sharma  [Aro78a]  described  a technique  for  the  detection  of 
additive  overflow  in  a residue  number  system  where  the  moduli  are  not 
relatively  prime  (redundant  residue  number  system).  Debnath  and 
Pucknell  [Deb78a]  studied  the  multiplicative  overflow  and  provided  a 
method  for  detecting  it  for  the  specific  moduli  choice  {16,  15). 
Huang  [Hua83a]  presented  a new  fully  parallel  mixed-radix  conversion 
(MRC)  algorithm  which  utilizes  the  maximum  parallelism  that  exists  in 
the  RNS  to  mixed-radix  (HR)  digits  conversion  to  achieve  high 
throughput  rate  and  very  short  conversion  time.  The  short  conversion 
time  was  achieved  first  by  separating  the  table  look-up  and  the 
arithmetic  operations  such  that  they  both  can  be  performed 
simultaneously,  and  second  by  processing  the  arithmetic  operations  in 
parallel  in  each  modulus.  An  efficient  and  fast  way  of  overflow 
detection  was  obtained  based  upon  the  mixed-radix  (MR)  digits  of  a 
residue  number  being  known.  Taylor  [Tay83a]  presented  a technique  by 
which  an  overflow-free  RNS  product  can  be  computed  rapidly  in  a 
limited  amount  of  commercially  available  hardware.  The  16-bit 
autoscaled  residue  multiplier  is  based  on  a 4k  RAM  and  it  internally 
manages  the  register  overflow  problem.  Therefore,  general-purpose 
arithmetic  units  and  digital  filters  may  be  configured  to  work  within 
a fixed  16-bit  word  length. 
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2.2.4  Scaling 

Scaling  is  a very  useful  operation  in  the  RNS  because  it  is  used 
to  prevent  overflow.  Because  scaling  is  not  closed  in  the  RNS,  the 
scaled  result  has  to  be  rounded  to  its  nearest  integer.  Two  main 
disadvantages  of  the  scaling  operation  are  that  such  operations  are 
usually  memory-intensive  as  well  as  that  the  dynamic  range  of  the 
scaled  variable  is  smaller  than  that  of  the  unsealed.  Using  ROM 
arrays,  Jullien  [Jul78a]  presented  a memory-intensive  algorithm  for 
scaling.  Using  table  lookups  and  a data  compression  technique,  Taylor 
and  Huang  [Tay82b]  presented  a very  elegant  scaling  method  which 
doesn't  have  the  above  two  disadvantages.  This  novel  technique  was 
called  the  autoscale  algorithm  and  was  applied  for  the  case  of  the  3- 
moduli  RNS  {2n  - 1,  2n,  2n  + 1}  . 

Miller  and  Polky  [Mil83a]  presented  a new  scaling  algorithm,  which 
is  based  on  the  Chinese  Remainder  Theorem  and  performs  scaling  during 
a single  clock  period.  Later,  Miller  and  Polky  [Mil84a]  presented 
another  scaling  algorithm  which  is  based  on  the  mixed-radix  conversion 
process  and  permits  scaling  by  a fixed  but  arbitrary  scale  factor  in  k 
clock  periods,  where  k is  the  number  of  moduli  used. 

2.2.5  Residue  to  Decimal  Conversion 

One  of  the  fundamental  problems  with  residue  arithmetic  is  the 
difficulty  associated  with  residue-to-decimal  conversion.  There  are 
basically  two  methods  used  in  RNS-to-decimal  conversion.  They  are  the 
Chinese  Remainder  Theorem  (CRT)  and  the  Mixed-Radix  Conversion  (MRC). 

A hardware  implementation  of  the  Chinese  Remainder  Theorem  based 
on  the  distributed  arithmetic  filter  of  Peled  and  Liu  [Pel74a]  was 
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presented  by  Jenkins  and  Leon  [Jen77a].  In  that  case  a modulo  M 
adder-shifter  network  was  required.  An  implementation  based  on  the 
Mixed-Radix  Conversion  technique  was  described  by  Baraniecka  and 
Jullien  [Bar78a].  This  method  does  not  require  a modM  adder.  Most  of 
the  RNS-to-decimal  conversion  techniques  reported  in  the  literature 
require  a significant  hardware  investment  and  consume  a 
disproportionate  amount  of  run-time  compared  to  other  RNS 
computational  operations  (addition,  subtraction,  and  multiplication). 
Taylor  and  Ramnarayanan  [Tay81d]  presented  three  new  RNS-to-decimal 
conversion  techniques  for  small  moduli  (<  4 bits  each),  medium  moduli 
(<  6 bits  each),  and  large  moduli  (<  12  bits  each).  Their  work  was 
restricted  to  the  three  moduli  set  P = {2n+l , 2n, 2n-l) . A mixed-radix 
conversion  algorithm  for  this  particular  moduli  set  was  also 
presented.  Compared  to  conventional  RNS-to-decimal  conversion 
algorithms,  the  derived  algorithm  possessed  the  following  attributes: 
no  modulo  M addition  required  as  in  the  case  of  CRT  or  MRC  methods, 
practical  realization  of  very  large  moduli  RNS  system,  and  simple 
architecture  and  reduced  complexity.  Huang  [Hua83a]  presented  a fully 
parallel  mixed-radix  conversion  (MRC)  algorithm,  which  utilizes  the 
maximum  parallelism  that  exists  in  the  residues  (RNS)  to  mixed-radix 
(MR)  digit  conversion  to  achieve  high  throughput  rate  and  very  short 
conversion  time.  Soderstrand  et  al.  [Sod83a]  discussed  a new 
technique  for  RNS-to-binary  conversion  based  on  the  CRT  which  allows 
conversion  with  only  one  level  of  ROM  and  one  level  of  adders. 
Compared  to  previous  CRT  or  MRC  techniques,  the  new  technique  offers 
conversion  times  which  are  independent  of  the  number  of  moduli  of  the 
Residue  Number  System  and  conversion  times  are  usually  much  faster 
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than  competitive  techniques.  Alia  and  Martinelli  [Ali84a]  proposed  a 
VLSI  computing  architecture  for  converting  an  integer  number  N from 
the  weighted  binary  representation  into  and  out  of  a residue  code 
based  on  a S moduli.  Later,  Bernardson  [Ber85a]  described  an 
algorithm  and  its  hardware  implementation  which  converts  the  3 moduli 
(2n-l,2n,2n+l)  residue  numbers  into  their  binary  representation.  His 
technique  required  only  binary  adders  and  no  table  look-ups.  The 
result  is  that  a 66-bit  converter  with  a conversion  time  of  120ns  or  a 
36-bit  one  with  40ns  can  be  realized  as  single  CMOS  chips  with  3p 
geometries . 

2.2.6  Symmetric  RNS 

The  Symmetric  Residue  Number  System  (SRNS)  provides  some 
advantages  over  the  conventional  RNS  in  representing  and  processing 
negative  numbers.  Its  range  of  residues  modp  is  [-p/2,  p/2],  while 
the  range  in  a conventional  RNS  is  [0,  p]  and  the  tasks  of  sign 
detection,  magnitude  comparison,  and  determination  of  the  additive 
inverse  are  less  complex  in  the  SRNS  than  in  the  standard  RNS.  An 
algorithm  for  dividing  in  the  SRNS  was  given  by  Kinoshita  et  al. 
[Kin70a].  This  algorithm  calls  for  the  tables  of  the  residue 
representations  of  the  numbers.  Each  number  is  determined  by  the 
product  of  certain  coefficients  and  the  quotient  is  obtained  as  an 
iterative  sum  of  the  table  entries.  Another  iterative  method  of 
division  in  the  SRNS  developed  by  Kinoshita  et  al.  [Kin73a]  requires 
the  availability  of  two  tables  of  the  symmetric  residue 
representations  of  a certain  kind  of  integer.  The  same  authors 
presented  the  logical  implementation  of  arithmetic  operations  in  the 
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SRNS  as  well  as  floating  point  algorithms  [Kin73b,  Kin74a].  A sign 
detection  technique  in  the  SRNS  that  avoids  the  slow  process  of  the 
mixed-radix  conversion  was  given  by  Kaushik  and  Arora  [Kau81a]  while 
Clemens  provided  a modified  definition  of  the  SRNS  with  improved 
scaling  and  overflow  detection  properties  [Cle85a]. 

2.2.7  Arithmetic  Units 

The  design  of  arithmetic  units  such  as  adders,  subtractors,  and 
multipliers  for  the  RNS  has  been  studied  by  most  researchers  using 
table  look-up  implementations.  Using  4k-bit  memories  (i.e.,  12-bit 
wide  address),  the  moduli  size  was  limited  to  6-bits.  Yau  and  Chung 
[Yau76a]  considered  modulo  arithmetic  as  cyclic  groups  based  on  a 
decomposed  mapping  approach.  Mapping  relations  and  a binary  encoding 
method  have  been  investigated  and  a new  class  of  code  called  the 
circulative  code  has  been  developed  with  methods  for  generating  such  a 
code  presented.  A novel  design  for  a modulo  (2n-l)  adder,  which  uses 
only  2-input  gates  and  is  characterized  by  a low  hardware  investment 
O(n^)  and  delay  O(logn),  has  been  described  by  Bioul  et  al.  [Bio75a]. 
Agarwal  [Aga78a]  presented  a technique  for  representing  numbers  modulo 
(2n+l)  and  described  fast  addition  schemes  using  carry-lookahead  and 
modular  complementation,  while  Taylor  and  Huang  [Tay81b]  have  reported 
on  a floating  point  residue  arithmetic  unit  with  overflow  and  sign 
detection. 

Since  arithmetic  modp  is  equivalent  to  a cyclic  group  if  and  only 
if  p is  prime,  then  the  realization  of  a modp  multiplier  is  quite 
simple  through  the  use  of  a technique  similar  to  logarithms. 
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A multiplication  scheme  suitable  for  any  prime  or  non-prime  moduli 
choice  has  been  described  by  Soderstrand  and  Vernia  [Sod80a].  Taylor 
made  a very  important  contribution  by  choosing  the  moduli  set  {2n  - 1, 
2n,  2n  + 1},  which  was  called  the  3-moduli  RNS  [Tay81a].  Such  a 
moduli  choice  enjoys  the  advantage  of  a large  dynamic  range  of 
approximately  3n  bits  although  the  number  of  moduli  paths  is  only 
three.  Taylor  designed  high  speed  multipliers  that  can  compute 
equivalent  18-bit  full  precision  products  at  a pipelined  rate  of  28.5 
x 1(P  multiplies  per  second  using  high  density  memories  and  the 
quarter  squared  algorithm  AB  modp  = ((A  + B)^  • 4-l  - (A  - B)^  • 4~1) 
modp,  where  4-^  is  the  multiplicative  inverse  of  4 modulo  p.  Taylor's 
main  contribution  was  that  although  4 is  not  relatively  prime  to  2n 
the  multiplier  still  works  in  Z2n.  Initially,  Taylor  used  11-bits  for 
each  one  of  his  3-moduli  RNS  channels  due  to  memory  address  space 
limitations,  something  that  was  later  overcome  using  VLSI  technology 
[Tay82a].  Later,  Taylor  [Tay83a]  presented  a technique  by  which  an 
overflow-free  RNS  product  can  be  computed  rapidly  in  a limited  amount 
of  commercially  available  hardware.  The  system  made  use  of  the 
popular  3-moduli  set  {2n-l,2n,2n+l} . Based  on  4k  high-speed  memory 
technology,  a practical  16-bit  multiplier  could  be  configured.  Since 
this  16-bit  autoscaled  residue  multiplier,  internally  manages  the 
register  overflow  problem,  general-purpose  arithmetic  units  and 
digital  filters  may  be  configured  to  work  within  a fixed  16-bit  word 
length. 

Soderstrand  and  Fields  [Sod77a]  reported  on  two  multipliers  for 
completely  general  fractional  multiply,  while  Huang  and  Taylor 
[Hua79a],  based  on  some  symmetry  properties  found  in  modular 
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arithmetic  matrices,  discovered  a memory  compression  scheme  that 
reduces  the  memory  requirements  by  almost  a factor  of  four. 

Multiplication  is  central  to  the  implementation  of  Fermat  number 
transforms  (FNT)  and  other  residue  number  algorithms.  Therefore, 
there  is  need  for  a good  multiplication  algorithm  which  can  be 
realized  easily  on  a VLSI  chip.  Chang  et  al.  [Cha85a,  Cha85b] 
modified  the  Leibowitz  multiplier  to  realize  multiplication  in  the 
ring  of  integers  modulo  a Fermat  number.  The  advantage  of  this  new 
algorithm  over  Leibowitz's  algorithm  is  that  Leibowitz's  algorithm 
takes  the  modulo  after  the  product  of  multiplication  is  obtained. 
Hence,  time  is  wasted.  In  this  new  algorithm,  modulo  is  taken  in 
every  bit  operation  when  performing  multiplication.  Therefore,  no 
time  is  wasted  in  this  respect.  Furthermore,  this  algorithm  requires 
only  a sequence  of  cyclic  shifts  and  additions.  The  design  for  this 
new  multiplier  is  regular,  simple,  expandable,  and  therefore,  suitable 
for  VLSI  implementation. 

2.2.8  Error  Correction  in  RNS 

Error  correction  and  detection  in  the  RNS  is  of  great  importance 
since  the  RNS  is  an  unweighted  system. 

A method  of  providing  error  correction  using  redundant  moduli  was 
provided  by  Mandelbaum  [Man72a].  Initially,  he  used  two  redundant 
moduli  for  single  error  correction  but  then  extended  the  method  to 
multiple  errors. 

Skavantzos  [Ska82]  and  Skavantzos  and  Taylor  [Ska83]  presented  2- 
dimensional  and  3-dimensional  product  codes  for  correcting  random 
errors  and  burst  errors  in  RNS  units.  They  also  discussed  a new 
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decoding  technique,  the  so-called  "error  image  decoding  technique," 
which  provides  hardware  and  time  efficiency,  suitable  for  RNS  high 
speeds . 

An  error  checker  with  reduced  hardware  complexity  and  its  VLSI 
implementation  were  presented  by  Jenkins  [Jen82a,  Jen83a]. 
Ramachandran  [Ram83a]  presented  a new  method  to  correct  single  errors 
in  an  n-residue  number  system  through  the  use  of  r redundant  moduli. 
The  method  required  [2n/r]  + 2 recombinations  of  n residues  in  the 
worst  case.  This  was  proven  to  be  of  lower  complexity  than  any  other 
known  method.  Krogmeir  and  Jenkins  [Kro83a]  analyzed  properties  for 
error  detection  and  correction  for  a Quadratic  Residue  Number  System 
(QRNS)  with  redundant  moduli.  In  their  method,  in  order  to  provide 
error  detection/correction,  redundancy  was  added  in  terms  of  r 
additional  moduli  mg^,  mg+2,  ...,  nig+r’  These  must  be  chosen  so  that 
all  the  moduli  in  the  system  are  relatively  prime,  (m^mj)  = 1 for  i 4 
j and  i,j  = l,2,...,L+r.  An  additional  constraint  imposed  by  the  QRNS 
is  that  all  moduli  have  prime  factorizations  consisting  of  powers  of 
prime  numbers  of  the  form  4n  + 1.  Jenkins  [Jen84a]  showed  that  an 
error  checker  for  an  RNS  digital  processor  can  be  easily  realized 
using  the  Chinese  Remainder  Theorem  and  a concept  called  biased 
addition.  Biased  addition  simplifies  the  mechanization  of  modular 
addition  with  respect  to  a large  modulus,  as  required  in  the  Chinese 
Remainder  Theorem,  and  it  also  simplifies  the  generation  of  the 
complete  set  of  projections  Xj  = (X)modmj , j = l,...,L+r,  as  required 
for  error  location  and  correction.  The  resulting  architecture  for  the 
error  checker  was  considerably  simpler  than  those  previously  reported 
based  on  mixed-radix  conversion.  At  the  same  time,  Bell  and  Jenkins 
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[Bel84a]  presented  a design  for  an  experimental  device,  which 
converted  data  from  residue  representation  to  mixed-radix 
representation  while  simultaneously  checking  for  single  digit  errors. 
The  experimental  system  had  a high-speed  pipelined  architecture  and 
operated  with  five  5-bit  moduli,  two  of  which  were  redundant. 

Barsi  and  Maestrini  [Bar80a]  studied  RNS  codes  where  redundancy 
was  introduced  by  removing  the  constraint  that  the  moduli  be  pairwise 
prime.  These  types  of  codes  were  found  to  provide  a wide  coverage  of 
random  errors.  The  Reed-Solomon  and  Goppe  error  correction  codes  are 
based  upon  the  Chinese  Remainder  Theorem.  This  fact  was  observed  by 
Mizumachi  and  Kamiya  [Miz79a]  who  presented  a method  for  encoding 
based  on  an  implementation  of  the  CRT.  Barsi  and  Maestrini  [Bar78b] 
provided  redundancy  by  considering  the  RNS  to  be  divided  into  equal 
widths  and  defining  the  magnitude  index  of  a number  X as  an  integer 
locating  X into  one  such  interval.  Using  this  redundancy,  they 
provided  error  detection  and  correction.  They  also  provided  a lower 
bound  for  redundancy  allowing  p-error  correction  in  arithmetic  codes 
based  on  the  RNS.  Shiozaki  et  al.  [Shi75a]  exploited  the  symmetric 
nature  of  error  correcting  codes  based  on  the  redundant  RNS  and 
proposed  a method  suitable  for  error  correction,  while  Rao  [Rao81a] 
focused  on  problems  relating  to  the  arithmetic  in  GF(p),  for  p prime, 
since  the  implementation  of  these  operations  was  found  to  be  important 
to  the  construction  of  error  detecting  and  correcting  codes. 


21 


2 . 3 RNS  Applications 

The  type  of  applications  that  are  suitable  for  RMS  implementations 
are  computation  intensive  and  allow  no  conditional  branching.  A 
number  of  such  applications  are  now  presented. 

2.3.1  Digital  Filters 

Jenkins  and  Leon  [Jen77a]  reported  on  implementing  a finite 
impulse  response  (FIR)  digital  filter  in  the  RNS  and  compared  the  RNS 
implementation  of  a 64th-order  dual  bandpass  filter  with  other  filter 
structures  to  illustrate  tradeoffs  between  speed  and  hardware 
complexity.  The  result  was  that  the  RNS  FIR  filter  was  shown  to 
support  much  higher  filtering  rates  than  conventional  binary  filters. 
Soderstrand  et  al.  [Sod80b]  studied  a totally  adaptive  digital  filter 
for  system  identification  that  used  RNS  arithmetic  and  multiple 
microprocessors,  each  one  used  to  perform  arithmetic  in  each  modulus 
channel . 

Etzel  and  Jenkins  [Etz82a]  focused  on  the  problem  of  scaling  for 
overflow  prevention  in  RNS  recursive  digital  filters  and  described 
special  residue  classes  in  which  scaling  becomes  a simple  operation. 
Leung  [Leu81a]  studied  the  application  of  the  RNS  to  complex  digital 
filters  and  showed  a way  by  which  a complex  product  can  be  performed 
with  only  two  integer  multiplications.  Complex  residue  arithmetic  for 
high-speed  processing  of  complex  waveforms  has  been  considered  by 
Jenkins  [Jen80a]  and  has  been  used  for  processing  digital  complex 
waveforms  in  communications  and  radar  systems. 
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An  adaptive  digital  control  problem  was  studied  by  Tan  and  Mclnnis 
[Tan81a].  The  implementation  was  based  on  the  minimum  variance 
control  law  and  the  LMS  algorithm,  using  the  RNS.  Soderstrand  and 
Kelley  [Sod81a]  designed  a low-cost  adaptive  FIR  filter  implementing 
the  LMS  algorithm  based  on  the  RNS.  They  based  their  design  on  a 
multiplier  previously  developed  by  Soderstrand  and  Vernia  [Sod80a]. 
Etzel  and  Jenkins  [Etz80a]  used  the  redundant  residue  number  system  to 
provide  error  detection  and  correction  in  digital  filters. 

Jenkins  showed  [Jen78a]  that  the  combination  of  the  distributed 
arithmetic  multiplication  scheme  of  Peled  and  Liu  with  residue  number 
architectures  resulted  in  better  speed/cost  ratios  obtained  with  this 
hybrid  architecture  than  with  either  structure  alone,  while  Wolenty 
and  Jenkins  [Wol80a]  provided  a realization  of  a second  order  elliptic 
filter  using  two  microprocessors  to  realize  this  hybrid  architecture. 
Two  more  RNS  digital  filters  were  described  by  Soderstrand  [Sod78aJ: 
a microprocessor-based  lowpass  digital  filter  in  a mass  spectrometer 
experiment  and  an  adaptive  digital  filter. 

Shubs  [Shu80a]  observed  that  digital  signal  processing  algorithms 
can  be  classified  into  non-iterative  and  iterative  ones  and  proved 
that  the  RNS  can  be  advantageously  used  in  non-iterative  algorithms 
(for  more  than  128  samples  in  the  processed  signal),  whenever  the 
volume  of  non-modular  operations  does  not  exceed  about  25  percent  of 
the  volume  of  modular  operations. 

Soderstrand  and  Sinha  [Sod84b]  discussed  a pipelined  recursive 
Residue  Number  System  Digital  filter.  Through  their  technique, 
pipelined  HR  filters  based  on  RNS  Read-Only-Memory  (ROM)  table  look- 
up techniques  can  be  designed  which  offer  throughput  rates  equal  to 
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the  table  look-up  time  of  the  ROMs.  This  high-speed  realization  can 
be  achieved  even  though  the  recursive  filter  algorithm  requires 
multiple  delays  in  realizing  the  output  of  the  filter. 

Shah  et  al.  [Sha85a]  proposed  a hardware  structure  for 
implementing  two-dimensional  (2-D)  recursive  digital  filters  based  on 
the  residue  number  system  (RNS).  The  parallel  pipelined  structure 
arising  from  the  use  of  RNS  arithmetic  facilitated  video  bandwidth 
filtering.  Their  realization  called  for  the  use  of  ROMs  and  first-in- 
first-out  (FIFO)  shift  registers  as  the  building  blocks  of  the  system. 

The  3-moduli  set  {2n-l,2n,2n+l}  residue  number  system,  had  been 
shown  to  possess  several  attractive  properties.  In  particular,  the 
problem  of  scaling  is  more  simplified  through  the  use  of  an  autoscale 
algorithm.  Using  this  approach  of  the  autoscale  algorithm,  Ramnarayan 
and  Taylor  [Ram85a]  studied  a class  of  RNS  recursive  digital  filter 
structures  in  the  context  of  their  precision.  The  filter  forms 
differed  from  one  another  in  the  placement  of  the  autoscale  unit.  The 
error  introduced  by  the  scaling  algorithm,  called  the  scaling  error, 
is  the  counterpart  of  roundoff  error  in  fixed-point  arithmetic. 
Formulas  for  predicting  the  scaling  error  variance  for  the  different 
filter  architectures  were  developed  and  optimal  values  for  the  various 
parameters  of  the  RNS  filters  were  derived. 

Soderstrand  and  Escott  [Sod86a]  investigated  the  feasibility  of 
combining  Multiple  Valued  Logic  with  Residue  Number  System  Arithmetic. 
They  developed  a very  detailed  block  diagram  representation  of  a 
Multiple  Valued  Logic  (MVL)  -RNS  digital  filter  independent  of  the 
choice  of  levels  in  MVL  or  moduli  in  RNS.  A detailed  simulation  of  an 
Emitter  Coupled  Logic  (ECL)  realization  of  a 10-weight  MVL-RNS  digital 
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filter  showed  that  a 30MHz  throughput  was  achievable  with  8-bit 
equivalent  input/output  data  and  16-bit  equivalent  computation.  This 
ECL  implementation  would  require  less  than  20,000  components,  an 
average  of  2.5  VLSI  chips  and  less  than  13W  per  filter  weight. 

2.3.2  Discrete  Fourier  Transform  (DFT) 

The  implementation  of  the  Winograd  Fourier  Transform  Algorithm 
(WFTA)  using  the  RNS  was  studied  by  Huang  and  Taylor  [Hua81b],  who 
described  a new  technique  for  input/output  reordering  that  required  no 
extra  memory  for  reordering  when  the  moduli  set  of  the  RNS  was 
carefully  chosen  to  contain  the  base  numbers  of  the  WFTA.  Huang  and 
Taylor  also  computed  the  scaling  overhead  for  RNS  implementations  of 
the  radix  2 and  4 DFTs,  the  Good  Winograd  Fourier  Transform  (GWFT), 
and  the  general  N Winograd  transform,  while  Taylor  and  Huang  provided 
a detailed  study  of  these  DFT  implementations  [Tay81c].  Tseng  et  al. 
[Tse79a]  developed  optimum  procedures  for  choosing  scaling  factors  as 
well  as  the  position  of  scaling  arrays  in  RNS  FFT  structures  while 
Tseng  et  al.  [Tse78a]  provided  a prediction  of  RMS  relative  error 
which  includes  A/D  quantization,  integer  normalization,  and  scaling 
rounding  considerations  for  RNS  based  FFTs  was. 

2.3.3  2-D  Processing 

Fouse  et  al.  [Fou81a]  designed  a residue-based  image  processor  for 
image  understanding  using  VLSI  technology.  The  processor  was 
operating  on  a 5 x 5 kernel  and  of  significant  interest  is  an  LSI 
custom  circuit  performing  the  bulk  of  the  residue  computations.  An 
RNS  processor  for  2-D  convolution  based  on  a 5 x 5 kernel  was 
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described  by  Huang  et  al.  [Hua81a].  It  was  shown  that  this  processor 
was  capable  of  performing  30M  5x5  filter  convolutions  per  second  for 
2-D  signal  detection.  Using  ECL  technology  for  the  table  look-up 
operations  and  with  optional  error  detection,  the  hardware  would  run 
error  free  at  a rate  of  20  Mops. 


CHAPTER  3 

THE  QUADRATIC  RESIDUE  NUMBER  SYSTEM  (QRNS) 

A multitude  of  important  signal  processing  and  communication 
applications  such  as  the  computation  of  FFT  and  autocorrelation  or 
cross-correlation  of  multiphase  communication  signals  require  a 
large  number  of  complex  multiplications  [Tay85a,  Opp75a,  Rab75a, 
[0pp78a].  In  such  complex  multiplication  intensive  environments, 
the  conventional  RNS  alone  might  not  be  adequate  solution  if  high 
throughput  is  required.  This  is  due  to  the  fact  that  a complex 
multiplication  requires  four  real  multiplications  and  two  real 

additions,  at  the  same  time  being  a two-level  operation.  In  this 
chapter,  a solution  that  reduces  the  complexity  of  a complex 
multiplication  will  be  presented.  This  solution  is  known  in  the 
literature  as  the  Quadratic  Residue  Number  System  (QRNS). 

3. 1 The  Theory  of  the  QRNS 

In  a conventional  Residue  Number  System,  a complex 
multiplication  requires  four  real  multiplications  and  two  real 

additions  per  moduli.  The  Quadratic  Residue  Number  System  (QNRS) 
changes  this  requirement  significantly.  This  system  is  defined  with 
an  isomorphic  mapping  originally  suggested  in  the  literature  by 
Vanwormhoudt  [Van78a]  and  later  on  by  Leung  [Leu81a]  and  Krogmeier 
and  Jenkins  [Kro83a] . This  useful  isomorphic  mapping  denoted  by  f 2 
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is  now  described.  Consider  a modulus  channel  with  modulus  m and  a 
complex  number  Z = x + jy,  where  x,  yeZm,  j = /-l,  (the  imaginary 
operation)  and  Zm  is  the  ring  of  integers  modm.  Then  the  QRNS 
mapping  is  an  isomorphic  mapping  f2  of  C(m)  onto  Zm  x Zm,  where  C(m) 
= (x  + jy  : x,  ysZm).  This  isomorphic  mapping  f2  satisfies  the 
following  equations: 

f 2 

Forward  Mapping  f2:  (x  + jy)  — > (z,  z*) 

f 2_1 

Inverse  Mapping  f2-1-:  (z>  z*)  > (x  + jy) 

z = (x  + jy)modm  z*  = (x  - jy)modm  (3.1) 

x = (2-1(z  + z*))modm  y = (2_1  j_1(z  - z*))modm  (3.2) 

where  x,  y,  z,  z*eZm. 

In  equations  (3.1)  and  (3.2)  the  number  j is  not  an  imaginary 
number  but  an  integer  belonging  to  the  ring  of  integers  modm.  That 
is  jsZm,  such  that  j is  a solution  of  the  second  order  congruence 

x2  = -1  modm  (3.3) 

or  j satisfies 

j2  = -1  modm  (3.4) 

The  entries  2~ ^ and  j--*-  in  equation  (3.2)  are  the  multiplicative 
inverse  of  2 and  j modulo  m,  satisfying 

(2  • 2~-*-)modm  = 1 (j  • j-l)modm  = 1 (3.5) 

and  2_1,  j_1sZm.  The  structure  Zm  x Zm  ^ Zm2,  which  is  called  QRNS, 

is  also  a finite  ring  consisting  of  m2  elements. 

The  rules  of  composition  in  the  finite  structure  C(m), 


(conventional  system),  are 
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Addition:  (x^  + jy^)  + (X2  + jY2)  = (X1  + X2)modm  (3.6) 

+ j(yi  + y2)modm 

Multiplication:  (x^  + jyi)  * (X2  + JY2)  = (X1  ' x2  " 

yi  • y2)modm  + j(x^  ' Y2  + x2  ’ yi)modm 
where  x^,  y^,  X2,  Y2  sZm»  3 = From  equation  (3.6)  it  is 

obvious  that  in  the  conventional  system  a complex  addition  requires 
two  real  additions  per  modulus  channel,  while  a complex 

multiplication  requires  four  real  multiplications  in  a first  level 
and  two  real  additions  in  a second  level  per  modulus  channel. 

In  the  QRNS  the  complex  multiplication  becomes  significantly 
more  simplified.  In  such  a system  the  rules  of  composition  are 
Addition:  (z^,  z\*)  + (z2,  Z2*)  = [(z^  + Z2)modm,  (3.7) 

(Z|*  + Z2*)modm] 

Multiplication:  (z]_,  z^*)  • (z2,  Z2*)  = [(z^  * Z2)modm, 

(z-^*  • Z2*)modm] 

where  z^,  Z2,  z^*,  Z2*sZm.  Equation  (3.7)  indicates  that  in  the 
QRNS,  the  complex  multiplication  requires  only  two  real 
multiplications  and  one  level  of  operations  per  modulus  channel. 
This  dramatic  decrease  from  six  operations  to  two  and  from  two 
levels  of  operations  down  to  one  implies  reduction  of  the  amount  of 
hardware  as  well  as  speed  increase  when  using  the  QRNS.  Clearly, 
the  QRNS  does  not  require  any  cross-products  when  performing  a 
complex  multiplication,  but  on  the  other  hand,  it  decomposes  it  into 
two  parallel  operations.  At  the  same  time,  if  both  the  additions 
and  multiplications  in  the  QRNS  are  realized  as  table  look-up 
operations,  then  the  amount  of  hardware  required  as  well  as  the 
speed  of  both  complex  addition  and  complex  multiplication  are  the 
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same.  In  this  way  the  QRNS  is  very  attractive  for  VLSI 

implementations . 

Up  to  this  point,  the  QRNS  has  been  defined  in  one  modulus 

channel  with  modulus  m.  Consider  now  an  L-moduli  system  with  moduli 

{m^,  m2,  ...,  mj^}  where  any  two  moduli  are  relatively  prime,  (i.e. 

GCD  (m^,  m j ) = 1 for  i / j,  where  GCD  (a,  b)  stands  for  Greatest 

Common  Divisor  of  a and  b).  A complex  number  Z = x + jy,  with  x, 

L 

yeZm,  M = II  m^  has  the  following  L-tuple  representation: 
i=  1 

Z = x + jy;  Z = (Zlf  Z2,  •••,  ZL)  (3.8) 

Zi  = xi  + jyj;  xj  = xmodm^,  y^  = ymodm^. 

If  Z-*-  = x-*-  + jy-*-  and  Z2  = x^  + jy2  are  two  such  complex  numbers 
with  X-*-,  y-*-,  x2,  y2  eZ^,  then  the  rules  of  composition  for  L-moduli 
Conventional  RNS  systems  are 

Addition:  Z^  = Z-*-  + Z^  = (x^  + jy-*-)  + (x2  + jy2)  = (x^  + jy^)  (3.9) 

wi  th 

xi^  = (xi3  + ><i^)n>odmj 
Yi3  = (Yi1  + Yi2)modmi 
i = 1,  2,  . . . , L 

Mutliplication:  Z^  = Z-*-  • Z^  = (x-*-  + jy-*-)  • (x2  + jy2)  (3.10) 


= (x2 

+ jy3) 

wi  th 

*i3- 

(Xi1  Xi2  - yj1  • yi2)modmi 

y,3  , 

(xi1  • y^2  + x^  • y^-*-)modm^ 

i = 1 , 

1 2 j • • • ? L 

Again  the  requirements  per  modulus  channel  for  the  complex 
multiplication  are  the  same  as  in  the  single  modulus  case:  two  real 

additions  in  one  level  for  the  complex  addition  and  four  real 
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multiplications  and  two  real  additions  in  two  levels  for  the  complex 
multiplication. 

To  be  able  to  define  the  QRNS  isomorphic  mapping  in  such  an  L- 
moduli  environment,  all  the  L second-order  congruences  x2  = -lmodm^, 
i = 1,  2,  L must  be  solvable. 

Then  the  QRNS  mapping  f2  is  shown  below 

f 2 : z <->  (z>  z*);  Z = x + jy;  j = /-I  (3.11) 

z = (zlt  z2,  ...,  zL);  z*  = (zx*,  z2*,  ...,  zL*) 
zi  = (*i  + Ji  ’ Yi )modm^ 
zi*  = (xi  - Ji  ' Yi)modmi 
x^  = (2_3(zj  + Zj[*))modm^ 

Yi  = (2_1  - zi*))modmi 

where  jj  is  a solution  of  the  second-order  congruence  x2  = -lmodm^, 
or  it  must  satisfy  jj2  = -Imodmi  for  i = 1,  2,  ...,  L.  It  is 

redundant  to  say  that  in  order  for  the  QRNS  isomorphism  to  exist, 
the  existence  of  2_1,  J_1,  Ti-1  in  equations  (3.2)  and  (3.11)  must 
be  guaranteed.  This  problem  is  going  to  be  examined  later  in  this 
chapter. 

The  rules  of  composition  in  the  L-moduli  QRNS  of  equations 
(3.11)  are 

Addition:  (z1,  z*1)  + (z2,  z*2)  = (z3,  z*3)  (3.12) 

where 

z3  = (zx3,  z23,  . . . , zL3) 

z*3  = (Z!*3,  z2*3 zL*3) 

zi3  = (z^l  + Z|2)modm^ 
z^*3  = (z^*l  + z^*2)modm^ 

i — 1 j 2 j • • • f L 
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Multiplication:  (z3,  z*3)  • (z3,  z*3)  = (z3f  z*3)  (3.13) 

Zi3  = (z^3  • zj3)modm^ 
z^*3  = (z^*3  • Z£*3)modm^ 
i = 1 , 2 , . . . , L 

Again  the  QRNS  in  an  L-moduli  case  outperforms,  by  far,  the 
conventional  RNS  in  that  only  two  real  multiplications  in  one  level 
are  required  per  modulus  channel  for  the  computation  of  a complex 
product . 

For  the  QRNS  mapping  f2  to  exist,  the  second-order  congruence 
(equation),  x3  = -lmodm,  must  be  solvable.  The  following  theorems 
provide  necessary  and  sufficient  conditions  for  the  solution  of  x3  = 
-lmodm. 

Theorem  3.1:  If  m = p,  prime  number,  then  the  necessary  and 

sufficient  condition  that  the  congruence  x3  s -lmodp  is  solvable  in 
Zp  is  that  p = 4k  + 1,  where  k is  a positive  integer.  In  this  case, 
the  above  congruence  has  two  solutions  named  j and  j*  which  are 
additive  and  multiplicative  inverses  modp,  or  j*  = -j  = j-l  modp. 

Proof:  For  proof,  the  reader  is  directed  to  several  references  on 

algebra  as  well  as  on  number  theory:  [Lip81a,  Her75a,  Arm80a, 

Wri39a,  Sta70a,  Sch86a] . A proof  of  this  theorem  is  also  going  to 
be  provided  in  Chapter  4 of  this  document. 

To  prove  that  the  two  solutions  of  x3  h -lmodp  are  additive  and 
multiplicative  inverses,  consider  j to  be  one  solution  of  the  above 
equation.  Then 


j2  = -lmodp 


(3.14) 


But  (-j)2  = j2  = -lmodp  and  this  means  that  the  additive  inverse  of 
j named  -j  is  also  a solution.  Multiplying  both  sides  of  equation 
(3.14)  with  j-2  we  arrive  at 

j-2  * j2  = -j-2  modp 
or 

j-2  = _i  modp 

which  definitely  shows  that  j-1  is  a solution  of  x2  = -1  modp.  To 
prove  that  -j  = observe  that 

-J  = -J2  ’ J-1  (3.15) 

But  since  J2  = -1,  equation  (3.15)  becomes  -j  = j-1. 

The  existence  of  2'1  and  j-1  which  is  necessary  for  the  QRNS 
isomorphism  is  guaranteed  by  virtue  of  p being  prime. 

The  two  solutions  j and  are  called  "quadratic  roots"  of  the 
equation  x2  = -lmodp.  The  number  -1  is  called  "quadratic  residue." 

It  is  interesting  to  observe  that  since  all  the  prime  numbers  p 
(besides  number  2)  are  odd,  then  they  are  of  the  form  p = 4k  + 1 or 
p = 4k  + 3,  and  from  Theorem  3.1  a prime  modulus  choice  of  p = 4k  + 
3 will  never  allow  the  solution  of  x2  = -lmodp.  The  following 
Theorem  3.2  examines  the  case  of  a non  prime  modulus  m. 

Theorem  3.2:  If  m is  an  odd  integer  with  a prime  decomposition  m = 

Plel  • P2e2  ...  pnen,  (p^,  . ..,  pn  primes),  then  the  necessary  and 
sufficient  condition  that  the  congruence  x2  = -lmodm  is  solvable  in 
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Zm  is  that  each  is  of  the  form  pi  = 4kj  + 1,  i = 1,  n where 

is  a positive  integer.  Again  the  two  solutions  are  additive  and 
multiplicative  inverses  modm. 

Proof:  Proof  of  the  above  theorem  will  be  presented  in  Chapter  4, 

as  a subcase  of  a more  general  argument. 

One  more  theorem  makes  the  theoretical  study  of  the  QRNS  more 
complete. 

Theorem  3.3:  The  QRNS  mapping  f2  from  C(m)  onto  Zm  x Zm  defined  by 

equations  (3.1)  and  (3.2)  is  an  isomorphism. 

Proof:  For  f2  to  be  an  isomorphism  it  must  satisfy 

(i)  f2  must  be  one-to-one  and  onto  [Her75a] 

(ii)  for  elements  Zj_,  Z2  eC(m)  it  must  be 

f2  (Zl  + Z2)  = f2  (ZX)  + f2  (Z2) 

f2  (Zl  • z2)  = f2  (Z:)  • f2  (Z2)  [Her75a] 

Proof  of  (i):  The  number  of  elements  in  C(m)  is  mz  and  so  is  the 

number  of  elements  in  Zm  x Zm.  It  must  now  be  shown  that  for  Zi, 
Z2eC(m) 

zl  * Z2  * f2  (Zx)  t f2  (Z2)  (3.16) 

Suppose  that  for  Zj  = xj  + jyj,  Z2  = x2  + jy2,  with  xj,  yi,  x2, 

y2£Zm 


Zl  / z2  =»  f2  (Zl)  = f2  (z2) 


(3.17) 
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But 

Z3  t Z2  « x3  # X2  or  ^ Y2  0 X1  ~ x2  * 0 modm  or  (3.18) 

yi  - y2  * 0 modm  0 xi  - x2  = k3  * m + k2  and 

yi  - y2  = k3  • m + k4 

with  klf  k3  integers  and  k2,  k4  e (0,  1 m - 1)  where  at  least 

one  of  the  k2,  k^  is  not  zero. 

For  f2  (Z3)  and  f2  (Z2)  equation  (3.1)  gives 

f 2 (zl)  = [ (X1  + jyi )modm,  (x3  - y1)modm]  (3.19) 

f2  (z2)  = l(x2  + jy2)modm,  (x2  - y2)modm] 

where  = -1  modm. 

For  f2  (Zj)  = f2  (Z2)  it  must  be 

xi  + jyi  = (x2  + jy2)modm  and  xi  - jyi  = (x2  - jy2)modm  ( 3 . 20) 

* (X1  - x2)  + j(yi  - y2>  = 0 modm  and  (x3  - x2)  - 
j(yi  - y2>  s 0 modm 

Substituting  in  equation  (3.20)  for  x3  - x2  and  y3  - y2  taken 
from  equation  (3.18)  we  get 

k3*m  + k2  + j(k3*m+k4)  s 0 modm  and  k3*m  + k2  - j(k3*m+k4)  = 0 modm 
* k3*m  + k2  + j(k3*m+k4)  = k3-m  and  k3*m  + k2  - j(k3*m+k4  = kg)  • m 
0 k2  + Jk4  = (k5  - kl  - jk3)  • m and  k2  - jk4  = (kg  - k3  + jk3)  • m 


where  k3,  kg  are  integers. 
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Assigning  K],  = k5  - k3  - jk3  and  K2  = k6  - k3  + jk3  we  get 


^2  + j^4  = Ki  • m 


(3.21) 


k2  “ Jk4  = K2  • m 


The  addition  of  both  sides  of  the  two  equations  (3.21)  gives 


k2  = 2 1 • (K3  + K2)  • m 


(3.22) 


while  the  subtraction  results  in 


k-4  = 2 1 • j * • (K3  - K2)  • m 


(3.23) 


Obviously,  both  k2  and  k^  are  zero  modm,  which  contradicts  the 
previous  statement  that  at  least  one  of  the  k2,  k4  is  not  zero. 

The  contradiction  occurred  due  to  the  assumption  that  f2  (Z^)  = 
f2  (Z2).  So  it  must  be  f2  (Z^)  t f2  (Z2). 

Proof  of  (ii);  Assume  Z1  = x1  + jy1?  Z2  = x2  + jy2,  x1?  ylf  x2, 
y2sC(m).  Then  equation  (3.1)  gives 

f2  (zl)  = t(xl  + Jyi)modm,  (x3  - jy^modm]  (3.24) 

f2  (z2^  = Kx2  + jy2)modm>  (x2  - jy2)modm] 

with  j such  that  j2  = -1  modm.  So 

f 2 (zl)  + f2  (z2)  = KX1  + x2 ) + j(yi  + y2)]modm,  (3.25) 

KX1  + x2)  - j(yi  + y2>  ]modm] 


zi  + z2  = (xi  + x2)  + j(yi  + y2)  and  from  (3.1) 
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f2  (Zi  + Z2)  = [[(xj  + x2)  + j (yi  + y2)]modm,  (3.26) 

[(xi  + x2)  - j (y^  + y2)]modm] 

A comparison  of  equations  (3.25)  and  (3.26)  results  in  f2  (Z}  + Z2) 
= f2  (Z]_)  + f2  (Z2).  For  the  multiplication 

Zi  • Z2  = (xj_  • x2  - yi  y2)modm  + j(xj^  y2  + x2  y-^modm 

and  from  equation  (3.1) 

f2  (Z1  ’ z2)  = [ ( (X1  * x2  " yi  ' Y2>  + j(xl  Y2  + x2  yi))modm,  (3.27) 
((*1  ‘ x2  “ yi  ' y2>  ~ l(xl  * y2  + x2  • y-^modm] 

But  from  equation  (3.24) 

f2  <zl)  ' f2  (z2)  = Kxi  + jyi)  • (x2  + jy2)modm, 

(X1  “ jyi)  ' (x2  ~ jy2)modm] 

= [ [xj -x2  + j2yi-y2  + j(x^-y2  + x2-y1]modm, 

[ xl * x2  + j2yi*y2  - j(xl'y2  + x2-y1)]modm] 

Since  j2  = -lmodm  then 

f2  <zl)  ' f2  (z2>  = [((xl  • x2  - yi  ' y2>  + j(xl  • y2  + (3.28) 

x2  - yi))modm,  ((x!  • x2  - y1  • y2)  - 
j(xl  ’ y2  + x2  * yi))modm] 


and  equations  (3.27)  and  (3.28)  prove  that 
f2  (Z1  ' z2)  = f2  (zl)  ’ f2  (z2) 
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It  is  clear  now  that  the  proof  of  Theorem  3.3  justifies  the 
simplified  rules  of  composition  in  the  QRNS,  equation  (3.7). 

Theorem  3.4:  The  mapping  f2-1  such  that  (z,  z*)  ^ 2 (x  + jy) 

which  is  described  by  equation  (3.2),  is  the  inverse  mapping  of  f2 
described  by  equation  (3.1). 

Proof:  Consider  ZeC(m)  where  Z = x + jy.  Then  by  equation  (3.1) 

f2(Z)  = [(x  + jy)modm,  (x  - jy)modm].  But  from  equation  (3.2)  we 
get  f2_1  (f2(Z))  = 

= f2_1  [(x  + jy)modm,  (x  - jy)modm]  = 

= [2-1  (x  + jy  + x - jy)modm,  2_1  j_1  (x  + jy  - x + jy)modm]  = 

= [ 2-1  • 2x , 2_1  j-1  • 2 jy]  = (x,  y)  = Z 

The  proof  is  completed  because  it  has  been  shown  that 

f2-X  <f2(Z»  = Z 

Finally,  it  is  necessary  to  mention  the  existence  of  the  entries 
j-1  and  2_1  that  are  required  for  the  inverse  QNRS  mapping  f2_1> 
(equation  (3.2)).  The  proof  for  the  existence  of  j--*-  is  covered  by 
the  proof  of  Theorem  4.12,  (Chapter  4),  while  the  existence  of  2-^ 
is  proven  by  Theorem  4.14,  (Chapter  4),  for  N = 2. 

The  following  two  examples  clarify  the  QRNS. 

Example  3.1:  Consider  a single-modulus  channel,  with  modulus  p = 

41.  This  modulus  is  prime  and  of  the  form  p = 4k  + 1,  (41  =4-10 
+ 1).  If  the  two  complex  numbers  to  be  multiplied  are  Z^  = 4 + j3 
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and  Z2  = 6 + j5,  the  product  in  the  conventional  RNS  will  be  Zy  • Z2 
= ((4  • 6 - 3 • 5)mod41 , (4  • 5 + 3 • 6)mod41)  = (9,  38)  = 9 + j 38 . 
For  the  above  product  to  be  computed  in  the  QRNS,  the  solutions 

n 

of  = -lmod41  would  be  required.  These  solutions  are  j = 9 and  -j 
= J-1  = 32. 

^2  (zl)  = ((4  + 9 • 3)mod41,  (4  - 9 • 3)mod41) 

= (31mod41,  -23mod41 ) = (31,  41  - 23)  = (31,  18) 
f2  (z2)  = + 9 • 5)mod41,  (6  - 9 • 5)mod41) 

= (51mod41 , -39mod41)  = (10,  2) 

^2  (zl)  ' ^2  (z2^  = ^2  (Z1  * Z2)  = (31  • 10mod41,  18  • 2mod41) 

= (23,  36) 

The  computation  of  the  inverse  mapping  f2~l  (23,  36)  calls  for 
2-1  and  j-1.  They  are  2~1  = 21  mod  41  and  j-1  = 32  mod  41.  So 
f2_1  (23,  36)  = (21  • (23  + 36)  mod  41,  21  • 32  • (23  - 36)  mod  41) 

= (9,  38)  = 9 + j 38 , 
the  correct  product. 


Example  3.2:  A three-moduli  system  is  considered  with  {mlt  m2,  m3} 

= {5,  13,  17},  all  prime  and  of  the  form  4k  + 1.  The  two  complex 
numbers  to  be  multiplied  are  Z3  = 3 + j4  and  Z2  = 10  + j5. 

Their  QRNS  representations  in  the  three  moduli  system  are 
f2  (Zx)  = {(1,  0),  (10,  9),  (2,  4)} 
f2  (Z2)  = {(0,  0),  (9,  11),  (13,  7)} 

f 2 <zl)  * f2  <z2)  = f2  <Z1  * z2>  = {(0.  0),  (12,  8),  (9,  11)} 
f2~1  {(0,  0),  (12,  8),  (9,  11)}  = {(0,  0),  (10,  3),  (10,  4)} 
or 


MZ1  * z2>  = (0-  !0,  10}  and  Im(Z1  • Z2)  = {0,  3,  4} 
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Finally,  the  application  of  the  Chinese  Remainder  Theorem  (CRT) 
gives  Re  (Z^  • Z2)  = 10,  Im  (Zi  • Z^)  = 55  which  is  the  correct 
product  [Kro83a]. 

The  significant  advantages  obtained  by  the  use  of  the  QRNS  for 
complex  multiplications  (i.e.,  reduce  the  multiplication  count  from 
four  real  multiplications  and  two  real  additions  in  two  levels  of 
operations,  down  to  two  real  multiplications  in  one  level  of 
operation)  have  been  successfully  applied  in  the  design  of  a radix-4 
FFT,  [Tay85b].  In  this  case,  it  has  been  found  that  if  the  QRNS  is 
not  used,  then  34  additions  and  multiplications  in  4 levels  of 
operations  are  required  per  butterfly  for  the  radix-4  FFT,  while  the 
use  of  the  QRNS  brings  the  count  of  additions  and  multiplications 
down  to  22  in  3 levels.  Therefore,  a much  faster  and  more  compact 
RNS-FFT  can  be  realized  in  less  hardware  than  previously  thought 
possible. 


3 . 2 Scaling  in  the  QRNS 


Magnitude  scaling  is  an  operation  that  might  be  required  at  the 
output  level  in  both  the  conventional  complex  RNS  as  well  as  the 
QRNS.  Taylor  et  al.  [Tay85b]  observed  that  there  is  actually  no 
complexity  difference  between  scaling  in  a conventional  complex  RNS 
and  a QRNS. 


L 

In  a conventional  RNS  with  moduli  {mj,  ...,  mL) , M = n mi? 

i=l 

a complex  number  Z = x + jy;  Z = (Z^,  ..,  Z^)  can  be  scaled  by  a 

scale  factor  K using  the  following  procedure. 

Step  1:  Convert  from  RNS  to  decimal  using  the  Chinese  Remainder 


Theorem  CRT 
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x = 


y = 


E ^ x^)mod  m^ 

,i  = l 


E 1 )mod 

,i=l 


modM 


modM 


(3.29) 


(3.30) 


with  Mj  = M/mj  and  M^1  is  the  multiplicative  inverse  of 
mod  , or  (M^  • Mi~1)modmi  = !• 

Step  2:  Scale  decimal  x and  y by  a scale  factor  K 

(3.31) 


~ 

"x‘ 

y' 

X = 

K. 

* y = 

JC. 

where  [•]  denotes  rounding  to  the  closest  integer. 

Step  3:  Convert  back  into  RNS  form 

~ RNS  ~ ~ ~ RNS 

x — > (xlt  x2,  • . 


(3.32) 


xL);  y — > (ylf  y2,  •••,  yL) 

In  a QRNS  environment,  a complex  number  (z,  z*);  z = (z^,  z2, 
...,  zL),  z*  = (z^*,  z2*,  ...,  Zj_,*)  can  be  scaled  by  K using  a 

similar  procedure  described  below. 

Step  1:  Residue-to-decimal  conversion  using  the  CRT 


z = 


z*  = 


' L 

E Mi(Mj-l  z^)mod 
i=l 

' L 

E Mi(Mi~l  z^*)mod  m^ 
j=l 


mod  M 


mod  M 


(3.33) 


(3.34) 


with  M^,  ^ defined  as  before. 

Step  2:  Scale  decimal  z and  z*  by  scale  factor  K 


z' 

z*' 

z = 

K. 

, z*  = 

K . 

(3.35) 


Step  3: 


Re-encode  in  QRNS 
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z ->  (Zj,  Z2,  ZL)  (3.36) 

Z*  -»  (Z]*,  z2*,  . . . , zL*) 

Observation  of  equations  (3 . 29)-(3. 36)  indicates  that  the 

scaling  procedure  in  the  QRNS  is  not  much  different  than  scaling  in 
the  conventional  RNS. 

3.3  Quadratic  Like  Residue  Number  Systems 

Two  more  subjects  will  be  briefly  presented  here.  They  are  the 
Quadratic  Like  Residue  Number  System  (QLRNS)  and  the  Modified 
Quadratic  Residue  Number  System  (MQRNS) . Both  of  these  systems  try 
to  use  the  reduced  complexity  of  the  QRNS,  but  at  the  same  time, 
remove  the  restrictions  on  the  moduli  choice  (p  = 4k  + 1)  that  the 
QNRS  requires. 

The  QLRNS  was  introduced  by  Soderstrand  and  Poe  [Sod84a].  In 
such  a system,  the  second-order  congruence  = -a  modp  is 

considered  instead  of  x^  s -1  modp  which  is  used  in  the  case  of  the 
QRNS. 

A complex  number  x + jy  will  now  be  presented  in  the  QLRNS.  The 
first  step  is  to  find  the  integers  m and  n such  that 

x + jy  ~ m + nj/a  (3.37) 

where  the  approximation  represents  truncation  or  rounding.  In  this 


case 
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m = x (3.38) 

n = y//a 

Then  the  integers  m and  n are  mapped  onto  z and  z*  where  the 
forward  and  inverse  mappings  are  shown  by  equations  (3.39)  and 
(3.40) 

z = (m  + nj/a)modp  z*  = (m  - nJ/a)modp  (3.39) 

m = (2_1  (z  + z*))modp  n = ((2  j/a)-1  (z  - z*))modp  (3.40) 

where  j is  the  solution  of  = -1  modp. 

The  rules  of  composition  in  the  QLRNS  are 

(zl>  zl*)  + (z2>  z2*)  = (Z1  + z2 > zl*  + z2*)  (3.41) 

(zl>  zi*)  • (Z2»  z2*)  = (Z1  z2>  zl*  z2*) 

Although  the  QLRNS  keeps  the  simplified  rules  of  composition  of 
the  QRNS  with  no  moduli  choice  restriction,  it  shows  several 
disadvantages.  These  are:  (i)  the  mapping  between  (x,  y)  and  (z, 

z*)  is  no  longer  an  isomorphic  mapping  any  more  and  (ii)  the  QLRNS 
has  a considerable  dynamic  range  reduction. 

The  other  QRNS-like  system  that  has  no  moduli  restriction  is  the 
so-called  Modified  Quadratic  Residue  Number  System  (MQRNS)  and  was 
introduced  by  Krishnan  et  al.,  [Kri86a].  In  such  a system,  an 
increase  in  real  multiplications  from  two  to  three  is  being 
introduced,  removing  in  this  way  the  restriction  on  the  moduli. 
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The  MQRNS  is  based  on  the  solution  of  x^  - n s 0 modm.  Consider 
a complex  number  a + jb  and  the  extension  element  a(mQ)  = (A,  A*) 
where, 

A = (a  + jb)modm  A*  = (a  - jb)modm  (3.42) 

with  j the  solution  of  x^  - n = 0 modm. 

Then  the  modified  residue  ring  MQR(m)  = [ (a(mQ) } : + , • ] has  the 
following  rules  of  composition: 

Addition:  A<MQ)  + b(mq)  = (A  + B,  A*  + B*)  (3.43) 

Multiplication:  a(^Q)  • b(^Q)  = [(A  • B)  - S,  (A*  • B*)  - S] 

where 

S = ((j^  + l)modm  • b • d)modm  (3.44) 

with  b,  d being  the  imaginary  parts  of  the  two  complex  numbers  to  be 
multiplied . 

The  complex  multiplication  can  now  be  performed  using  the  MQRNS 
as  follows: 

Consider  two  complex  numbers  Z]_  = a + jb  and  Z2  = c + jb  and  let 
j be  a solution  to  x^  h n modm. 

Calculate 

A = (a  + jb)modm,  A*  = (a  + jb)modm  (3.45) 

B = (c  + jb)modm,  B*  = (c  - jd)modm 

Let 

Q = (A  • B - S)modm,  Q*  = (A*  • B*  - S)modm  (3.46) 


where  S is  given  by  equation  (3.44). 
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Then  the  real  and  imaginary  parts  e and  f of  the  complex  product 
are  given  by 

e = (2~1  (Q  + Q*))modm  (3.47) 

f = ( 2~ l (Q  - Q*))modm 

Although  the  MQRNS  does  not  restrict  the  moduli  choice,  the 
reduced  complexity  of  the  original  QRNS  is  now  lost. 


CHAPTER  4 

THE  POLYNOMIAL  RESIDUE  NUMBER  SYSTEM  (PRNS) 


4. 1 The  PRNS  as  an  Extension  of  the  QRNS 

In  the  following  discussion  it  will  be  shown  that  the  product  of 
two  complex  numbers  is  equivalent  to  the  polynomial  product  of  two 
first  order  polynomials  taken  mod  (x^  + 1). 

Consider  A,  B e C(m)  to  be  two  complex  numbers  A = ag  + ja^,  B = 
b©  + jb^,  j = /-l,  aQ>  ai>  bg,  b^  £ Zm.  The  product  of  these 
complex  numbers  in  C(m)  is  given  by  the  equation 

C = A • B = (a0  + ja^  • (bQ  + jb]^)  = (a0b0  - a1b1)modm  + (4.1) 

j (agb^  + a^bo)modm 

Consider  now  the  two  first-order  polynomials  A(x),  B(x)  with 
coefficients  in  Zm,  where  A(x)  = ag  + a^x,  B(x)  = bg  + b^x,  ag,  a;[, 
bQ»  b^  e Zm.  The  product  of  these  two  polynomials,  with  arithmetic 
performed  in  Zm  and  the  polynomial  multiplication  taken  modulo  (x^  + 
1),  is  provided  by 

C(x)  = A(x)  • B(x)mod  (x^+l)  = (aQ+a^x)  • (bo+b^x)mod  (x^+l)  (4.2) 

= [apbo  modm  + [(aQb^  + a^bo)modm]x  + (a^b^modm)x^ ]mod(x^  + 1) 

= (agbQ  - a^b^)modm  + [(aQbj  + a^bQ)modm]x 
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A comparison  of  equations  (4.1)  and  (4.2)  shows  that  a complex 
multiplication  is  equivalent  to  a polynomial  product  of  two  first- 
order  polynomials  modulo  (x2  + 1).  Therefore,  if  the  polynomial  x2 
+ 1 can  be  factored  in  two  distinct  factors  modulo  m,  such  that  (x2 
+ 1)  = (x  - j)  • (x  + j)modm,  then  there  exists  an  isomorphic 
mapping  f2  described  by  equations  (3.1)  which,  for  modm  arithmetic, 
reduces  the  complexity  of  the  polynomial  product  A(x)B(x)mod  (x2  + 
1)  = [(ag  + ajx)(bg  + b^x)mod  (x2  + 1)]  from  four  real 
multiplications  and  two  real  additions  performed  in  two  levels  of 
operations  down  to  two  real  multiplications  performed  in  one  level. 
The  inverse  mapping  f2--'-  is  given  by  equations  (3.2). 

Extensions  of  the  above  idea  will  be  allow  a product  of  two  (N  - 
l)-order  polynomials  modulo  (x^  + 1)  to  be  performed  with  a 
multiplication-complexity  which  meets  Winograd's  lower  bound.  Such 
extensions  are  discussed  by  Taylor  [Tay86a]  and  are  found  to  be  of 
great  interest  in  applications  which  involve  the  product  of  two 
polynomials  modulo  a third  one  [McC79a]. 

Extensions  that  involve  the  product  of  two  (N  - l)-order 
polynomials  modulo  (x^  + 1)  over  some  modular  ring  Zm  define  a 
Polynomial  Residue  Number  System  of  order  N,  or  PRNS(N). 

To  obtain  the  lowest  possible  multiplication  count,  the 
polynomial  xN  + 1 must  be  factorized  in  N distinct  factors  in  Zm  as 

xN  + 1 s (x  - rQ)(x  - rj.)  ...  (x  - rN_1)  (4.3) 

with  rg,  rj_,  ••.,  rN-l  £ 2m*  In  other  words,  the  Nth-order 
congruence  x^  + 1 h 0 modm  must  have  N distinct  solutions  rg,  rj, 
...,  rjsj_ ^ £ Zm.  Then,  there  exists  an  isomorphic  mapping  f^  of  P(m) 
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onto  Zm  x Zm  x . . . x Zm  ^ ZmN  called  the  PRNS(N)  isomorphic  mapping, 
where  P(m)  is  a finite  structure  containing  (N  - l)-order 
polynomials  with  coefficients  in  Zm,  or 

P(m)  = {A(x)  = ag  + a^x  + ...  + a^i  x^-1  : a^  e Zm,  (4.4) 

i = 0,  ...,  N-l,  rules  of  composition  shown  by  equations 

(4.5)  and  (4.6)) 

N-l  . N-l 

For  A(x),  B(x)  e P(m),  with  A(x)  = E aj  x1,  B(x)  = I bj  x1,  the 

i=0  i=0 

rules  of  composition  in  P(m)  for  arithmetic  modm  are 


N-l 

Addition:  C(x)  = A(x)  + B(x)  = E [(a^  + bj^modmjx* 

i=0 


(4.5) 


Multiplication:  C(x)  = [A(x)  • B(x)mod(xN  + 1)] 


(4.6) 


This  isomorphic  mapping  fu  satisfies  the  following  equations: 


Forward  Mapping  f^  : A(x)  = ag  + a^  x + ...  + a^_^  x 


w i fN 
N-l > 


(ag  , a^  , • . . , a^-1  ) 


Inverse  Mapping  fN  1 : (ag*,  a^*,  ...,  aN_i’') > 


fN  1 


A(x)  = ag  + a^  x + ...  + a^_i  x 


N-l 


aj*  = A(x)  mod  (x  - rj),  i = 0,  1,  ...,  N-l 
where  r^  are  the  distinct  roots  of  x^  + 1 =0  modm 


(4.7) 
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A(x) 


with  a^, 


N-l  ' 

E 3i*  Qi(x) 
ki=0 


mod  (xN  + 


ai  > £ Zm,  i - Oj  1>  ••• 


1) 

N-l  and 


(4.8) 


Q^(x)  = N"^-  (1  + x + rj  ^ x2  + <<#  + rj  (^  ^)XN  2 (4.9) 

+ ri-(N-l)  XN-1) 
where  Q^(x)  also  satisfy 

Q^(x)  = S^j  mod  (x  - r-j)  ; i,  j = 0,  1,  ...,  N-l  (4.10) 

with  8^j  = 0 if  i t j and  5-jj  = 1 if  i = j. 

It  is  obvious  that  the  inverse  mapping  f^-^  is  defined  in  terms 
of  the  polynomial  form  of  the  Chinese  Remainder  Theorem  (CRT). 

The  rules  of  composition  in  Zm^  £ Zm  x Zm  x . . . x Zm  are 

Addition:  (ag*,  a^*,  ...,  a^_j*)  + (bg*,  bj*,  ...,  (4.11) 

^N-l  ) = ((a0  + b0  )modm,  (a^  + b^  )modm,  ..., 

'k  k 

(aN-l  + ^N-l  )m°8m) 


Multiplication:  (ag*,  a^*,  ...,  a{q_i*)  • (bg*,  b^*,  ...,  (4.12) 

bN_i*)  = ((ag*  • bg*)modm,  (a^*  • b^*)modm,  ..., 
k k 

(aN-l  " ^N-l  )m°dm) 


Although  no  justification  has  been  offered  for  the  forward  and 
inverse  PRNS  mappings  defined  by  equations  (4.7)  through  (4.10)  it 
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will  be  shown  by  a set  of  theorems  in  section  4.3  that  is  an 
isomorphism  and  that  is  the  valid  inverse.  These  theorems  will 
also  provide  proof  of  the  simplified  rules  of  composition  in  the 
PRNS  (equations  (4.11)  and  (4.12)). 

It  must  be  noted  here  that  the  inverse  mapping  f^--^  calls  for 
the  existence  of  N“1  and  r^“j  modm  for  i = 0,  . ..,  N - 1 and  j = 1, 
...,  N - 1 (equation  (4.9)).  At  the  end  of  section  4.2  the 
existence  of  N_^  and  rj~J  modulo  m will  be  proven. 

Equation  (4.12)  makes  clear  that  a polynomial  product  A(x)  • 

N-l  . N-l 

B(x)mod  (xN  + 1)  where  A(x)  = E aj  x1  and  B(x)  = E bj  x1, 

i=0  i=0 

requires  N multiplications  modm  and  no  additions,  if  performed  in 
Zm^.  The  same  polynomial  product  needs  multiplications  modm  and 
N(N  - 1)  additions,  if  performed  in  P(m).  In  addition,  the 
computation  of  the  polynomial  product  in  Zm^  requires  one  level  of 
operations,  while  more  levels  are  required  if  it  is  performed  in 
P(m).  To  utilize  the  full  potential  of  the  PRNS,  the  polynomial  xN 
+ 1 must  be  factored  into  N distinct  factors  in  Zm.  Section  4.2 
provides  all  the  necessary  theoretical  background  for  the  choice  of 
the  modulus  m so  that  the  factorization  in  N distinct  factors  is 
feasible,  while  the  validity  of  equations  (4.7)  through  (4.12)  is 
proven  in  section  4.3.  It  will  also  be  demonstrated  in  section  4.3 
that  the  QRNS  isomorphic  mapping  of  Chapter  3 is  nothing  more 
than  a PRNS(2)  mapping. 

Although  some  applications  presented  in  later  chapters  involve 
the  product  of  two  (N  - l)-order  polynomials  mod  x^  + 1 over  some 
modular  ring  Zm,  some  other  involve  the  product  of  two  such  (N  - 1)- 
order  polynomials  mod  x^  - 1 over  Zm.  In  the  latter  case,  the 
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minimum  possible  multiplication  count  is  obtained  by  factoring  the 
polynomial  - 1 in  N distinct  factors  in  Zm  as  xN  - 1 = (x  - tq) 
(x  - rj)  •••  (x  - r^_i)  with  tq,  rj.,  . ..,  rj^_^  s Zm.  Then  an  Nth- 
order  PRNS  system  can  be  defined  based  on  the  factorization  of  xN  - 
1 mod  m with  similar  theory  as  the  PRNS  which  is  based  on  the 
factorization  of  x^  + 1 modm.  This  variation  on  the  theme  will  be 
presented  in  section  4.4. 

4 . 2 Theorems  on  the  Modulus  Choice  m so  that  x^  + 1 
can  be  Factorized  in  N Distinct  Factors  modulo  m 

As  already  stated,  the  Nth-order  PRNS  mapping  fjq  exists  if  the 
Nth-order  congruence  x^  + 1 =0  modm  has  N distinct  solutions  r^, 
r2,  •••,  r^  e Zm  or,  equivalently,  the  Nth-order  polynomial  x^  + 1 
is  factored  in  N distinct  first-order  factors  modm  as  in 

x^  + 1 = (x  - r^)(x  - r£)  ...  (x  - r^)modm 

The  following  theorems  provide  the  necessary  information  for  the 
modulus  m so  that  xN  + 1 can  be  factored  in  N distinct  first-order 
factors  modm. 

Theorem  4.1:  If  N,  m are  positive  integers  and  the  prime 
factorization  of  m is  m = p^el  • P2e2  •••  PLeL,  with  N < p^,  p^ 
prime  i = 1,  2,  ...,  L,  then  there  are  distinct  rj_,  r2,  ...,  rjq  e Zm 
such  that  x^  + 1 s (x  - r^)  (x  - ...  (x  - r^)  modm  if  and  only 
if  N|(p^-l)/2,  i = 1,  2,  ...,  L,  where  a|b  reads  "a  divides  b." 
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Proof:  Before  the  proof  of  the  Theorem  4.1  is  provided,  the 
following  theorems  are  stated  and  proven,  which  provide  the  general 
background  for  the  proof  of  Theorem  4.1. 

Theorem  4.2:  If  p is  a prime  number  and  (c,  p)  = 1,  then  the 
congruence  xN  s c modp  has  (N,  p - 1)  solutions  or  no  solutions  if 
and  only  if 

c(P-l )/ (N>  p-1 ) = i modp  or  c~(P~^ ) ^ P-^ ) £ 1 modp 
respectively,  where  (a,  b)  ^ Greatest  Common  Divisor  of  (a,  b). 

Proof:  It  is  given  by  Wright  [Wri39a]. 

Theorem  4.3:  The  congruence  x^  s -1  modp,  for  prime  p,  has  N 

distinct  solutions  if  and  only  if  N|(p-l)/2. 

Proof : Referring  to  Theorem  4.2  we  have  c = -1  modp.  Since  it  is 
desired  to  have  N distinct  solutions,  it  must  be  (N,  p - 1)  = N. 
Then  by  Theorem  4.2,  the  congruence  x^  s -1  modp  has  (N,  p - 1)  = N 
distinct  solutions  if  and  only  if 


p-1 

(_1)(P~1)/N  = i modp  » (p  - 1)/N  = even  o = 2X 

N 


p-1  p-1 

* XN  =  *  * N| 

2 2 


Theorem  4.4:  If  a h b modm  and  n|m,  then  a s b modn. 
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Proof:  n|m  =*  m = Xn 

Since  a = b modm,  then  the  division  of  a by  m leaves  the  same 
remainder  as  the  division  of  b by  m,  shown  below 
a = • m + r^ 

with  rj  s {0,  1 , . . . , m - 1} 

b = (\2  ' m + rl 

Taking  into  account  that  m = Xn  and  also  dividing  r^  by  n,  the  above 
two  equations  become 

\ 

a=qi'A,n+q3*n+r2  a = (q^  • A + q3>  • n + r2 

. =>  , 

b=q2'A"n+q3*n+r2  b = (q2  * A + q3)  • n + r2 
=>  a h b modn 

Theorem  4.5:  If  the  polynomial  + 1 can  be  factorized  into  N 

distinct  factors  modm  as  xN  + 1 = (x  - r^)  (x  - ...  (x  - rN) 

modm,  then  for  any  prime  p that  is  a factor  of  m and  such  that  N < 
p,  the  roots  r^,  i = 1,  N are  distinct  modp. 


Proof : Suppose  that  two  of  the  N numbers  r^,  rj  are  congruent  modp, 

or  r^  s rj  modp.  Then  it  must  be  x^  + 1 s (x  - r^)2  . q(x)  modp, 
for  some  q(x). 

Taking  the  derivatives  of  both  sides  of  the  above  equation  we 

get 

d d 

— (xN  + 1)  — — [(x  - rj)2  qi(x)]  modp 
dx  dx 


» N x^ 


2(x  - r j ) qj(x)  + (x 


, d<u(x>‘ 

ri)2  

dx  . 


modp 
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• N x^1 


(x 


*i> 


2 qi(x)  + (x 


dqi(x)‘ 

*!>  

dx 


modp 


* N xN  h (x  - r^)  q2(x)  modp,  with  q2(x)  = 2 q^Cx)  + 

dqi(x) 

(x  - rj)  

dx 

Evaluating  the  above  equation  for  x = we  obtain:  N r^-*  £ 0 

modp  and  since  N < p =*  N £ 0 modp,  then  it  must  be  rj^-l  = 0 modp  or 
r^N  s 0 modp,  which  contradicts  the  fact  that  since  r^  is  a root  of 
x^  + 1 = 0 modp,  then  r^  £ -1  modp.  The  contradiction  occurred 
because  of  the  assumption  that  r^  £ rj  modp.  So  r^,  i = 1,  ...,  N 
are  distinct  modp. 


Theorem  4.6:  If  the  polynomial  x^  + 1 can  be  factorized  in  N 
distinct  factors  modulo  m as  xN  + 1 = (x  - r^)  ...  (x  - rN)  modm, 
where  m = p^el  • P2e2  ...  PLeL,  with  N < p^,  p^  prime,  i = 1,  ..., 
L,  then  N|(p^-l)/2,  i = 1,  ...,  L. 

Proof:  By  Theorem  4.5,  since  the  polynomial  x^  + 1 can  be 
factorized  into  N distinct  factors  modulo  m and  also  p.jjm,  i = 1, 
...,  L,  then  the  roots  rj,  r2,  ...,  rN  are  distinct  modp^,  1 = 1, 
. . . , L.  This  means  that 

xN  + 1 = (x  - ri)  (x  - r2>  ...  (x  - r^)  modpj,  i = 1,  ...,  L 

But  then,  from  Theorem  4.3,  it  must  be  N|(p^-l)/2,  i = 1,  ...,  L. 
Note  that  so  far  the  "only  if"  part  of  Theorem  4.1  has  been  proven. 
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Theorem  4.7:  If  f(c)  s 0 modm  then  f(x)  can  be  written  as  f(x)  = (x 
- c)  p(x)  modm. 

Proof : The  division  of  f(x)  by  (x  - c)  gives  f(x)  = (x  - c)  p(x)  + 
r(x)  modm,  where  degree  r(x)  < degree  (x  - c)  = 1.  So  (degree  r(x)) 
= 0 =»  r(x)  = r = constant  and  f(x)  = (x  - c)  p(x)  + r.  For  x = c 
this  gives  f(c)  = (c  - c)  p(c)  + r = 0=>0  = r=»  f(x)  = (x  - c)  p(x) 
modm . 

Theorem  4.8:  If  r^,  r2>  . ..,  rfj  are  roots  of  f(x)  = 0 modm  (i.e. 
f(rj)  = 0 modm,  j = 1,  ...,  N)  and  for  every  prime  factor  pj  of  m = 
Plel  • P2e2  •••  ?LeL  the  rj  are  all  distinct  modulo  p^,  then  the 
f(x)  can  be  written  as  follows: 

f(x)  = (x  - r^)  (x  - r2)  ...  (x  - rN)  p(x)  modm 

Proof:  The  proof  is  provided  by  mathematical  induction.  For  N = 1 
f(r^)  = 0 modm  and  by  Theorem  4.7  it  is  f(x)  = (x  - r^)  p^(x),  so 
the  result  holds  true  for  N = 1.  Suppose  that  the  result  holds  true 
for  N = k.  It  is  desired  to  prove  that  it  also  holds  true  for  N = k 
+ 1.  In  this  case,  by  the  hypothesis  of  the  theorem,  r^,  r2»  . .., 
rk>  rk+l  are  roots  °f  f(x)  - 0 modm  and  they  are  all  distinct  mod 
Pi,  i = 1,  ...,  L.  Since  f(rjc+i)  = 0 modm,  then  by  Theorem  4.7  it 
is  f(x)  = (x  - r|c+^)  p^Cx)  modm.  Since  r^  is  a root  of  f(x)  = 0 
modm  for  every  i = 1,  2,  ...,  k,  then  f(r^)  s (rj  - rjc+^)  Pic(r^) 
modm  s 0.  The  hypothesis  of  the  theorem  implies  that  all  the  rj  are 
distinct  modp^.  Then,  r^  and  r^+i  are  distinct  modp-^,  which  means 
Pi  -j-  (ri  - rjc+i),  (a  \ b reads  as  "a  does  not  divide  b").  The 


result  is  that 
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el 

Pi  + (ri  " rk+l>  * PI  + (ri  - rk+l> 


e2 

P2  + (ri  - rk+l>  * P2  + <ri  " rk+i) 


eL 

PL  + (ri  - rk+l>  ^ PL  + (ri  ~ rk+l> 

Finally,  m = piel  • P2e2  ...  Pj_,eL  -(•  (r^  - r^+i).  Taking  into 
account  that  (r^  - r^+i)  Pic(rj)  = 0 modm  and  also  m -(-  (r^  - r^+i), 
it  follows  that  pj^rj)  s 0 modm,  i = 1,  2,  — , k.  This  means  that 
the  distinct  modp^  numbers  rj_,  r2,  . ..,  r^  are  roots  of  Pk(x)  = 0 
modm.  By  the  inductive  hypothesis  that  the  result  holds  true  for  N 
= k,  it  follows  that 

Pk(x)  = (x  - ri)  (x  - r2)  ...  (x  - r^)  p(x)  modm 

Since  f(x)  = (x  - rjc+^)  pj^x)  modm,  then  f(x)  = (x  - r^)  (x  - r2> 
...  (x  - r^)  (x  - r^+j)  p(x)  modm,  and  the  result  holds  true  for  N = 
k + 1.  Therefore,  it  is  true  for  every  N and  the  proof  is 
completed . 

Theorem  4.9:  If  p is  a prime  number,  f(r)  = 0 modp  (r  is  a root  of 
f(x)  = 0 modp)  and  df(x)/dx  |x=r  £ 0 modp,  then  there  exists  a 
unique  s satisfying  f(s)  s 0 modpe  (s  is  a root  of  f(x)  = 0 modpe) 
with  s = r modp. 


Proof:  It  is  given  by  Hardy  and  Wright  [Har60a]. 
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Theorem  4.10:  If  N is  a positive  integer  and  p is  prime  and  N < p, 
then  there  are  N distinct  roots  r^,  t2,  . ..,  r^  e Zpe  of  the 
polynomial  x^  + 1 such  that 

xN  + 1 s (x  - r^)  (x  - r2>  ...  (x  - rN)  modpe 

if  and  only  if  N|(p-l)/2.  This  factorization  of  x^  + 1 in  N 
distinct  factors  is  unique. 

Proof : By  Theorem  A. 3,  x^  + 1 has  N distinct  roots  modp  named  x^, 
X2»  ...»  xN,  if  and  only  if  N|(p-l)/2.  Then  xN  + 1 s (x  - x^)  ( x - 
X2)  ...  (x  - x^)  modp.  From  Theorem  4.9,  since  p is  prime  and  f(x^) 
= 0 modp  for  i = 1,  N with  f(x)  = xN  + 1,  it  is  concluded  that 
for  every  x^  there  exists  a unique  r^  satisfying  f(r^)  = 0 modpe  or 
r-j^  +1=0  modpe  with  r^  = xj  modp  for  i = 1,  2,  ...,  N.  Since  x^, 
X2,  ..,  x^  are  all  distinct  modp,  then  rj_,  r2,  ...,  r are  also 
distinct  modp.  But  then,  from  Theorem  4.8,  f(x)  = xN  + 1 can  be 
written  as 

xN  + 1 h (x  - r^)  (x  - r2>  ...  (x  - rN)  p(x)  modpe 

Since  (degree  p(x))  = 0 then  p(x)  is  a constant  and  since  the 
coefficient  of  xN  is  1,  it  follows  that  p(x)  = 1 and  xN  + 1 s (x  - 
r^)  (x  - r2>  ...  (x  - rN)  modpe. 

So  far,  it  has  been  proven  that  if  N|(p-l)/2,  then  the 
polynomial  x^  + 1 can  be  factored  in  N distinct  factors  modpe  as  x^ 
+ 1 = (x  - r^)  (x  - r2>  ...  (x  - rN)  modpe. 

The  vice  versa  can  be  seen  to  be  true  by  Theorem  4.6  and  this 
completes  the  proof  of  the  theorem. 
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Now  that  all  the  required  background  has  been  developed  we 
return  to  the  proof  of  Theorem  4.1,  which  is  restated  for 
completeness . 

Theorem  4.1:  If  N,  m are  positive  integers  and  the  prime 
factorization  of  m is  m = p^el  * P2G2  ...  PLeL,  with  N < p^  pi 
prime,  i = 1,  ...,  L,  then  there  are  distinct  r^,  r2 , ...,  rN  s Zm 
such  that  xN  + 1 = (x  - rj)  (x  - r2)  ...  (x  - r^)  modm  if  and  only 
if  N|(pi~l)/2  for  every  i = 1,  2,  ...,  L. 

Proof : Theorem  4.6  stands  for  the  "sufficient"  part  of  Theorem  4.1. 
The  "necessary"  part  can  be  restated  as  follows: 

If  N,m  positive  integers  with  m = p^el  • P2e2  ...  PLeL,  N < p^, 
Pi  prime  and  if  N| (pi-l)/2  for  every  i = 1,  ...,  L,  then  xN  + 1 can 
be  factorized  in  N distinct  factors  modm  as  x^  + 1 s (x  - ri)  (x  - 
r2)  ...  (x  - r^)  modm.  Such  a factorization  is  not  unique  but  there 
are  (N!)L_1  different  factorizations.  The  proof  of  this  part 
follows . 

Since  N| (p^-l)/2  for  every  i = 1,  ...,  L,  then  from  Theorem  4.10 
the  congruence  x^  + 1 s 0 modpiei  has  N roots  modpiei  say  r^,  r^, 
...,  rij^j.  These  roots  are  also  unique. 

Call  mi  = Piei  and  suppose  that 

rll* r12» • • • » rlN  are  the  N distinct  roots  modmi  of  xN  + 1 h 0 modmi 
r21’r22> • • • >r2N  are  the  N distinct  roots  modm2  of  xN  + 1 = 0 modm2 
rLl > rL2 > • • • > rLN  are  the  N distinct  roots  modm^  of  xN  + 1 = 0 modmL 
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Consider  the  matrix  A 


rll  r12  •••  rlN 


A = 


r21  r22 


r2N 


rLl  rL2  • • • rLN 

/ 


This  matrix  has  L rows  and  N columns.  Applying  the  Chinese 
Remainder  Theorem  for  the  elements  of  the  j th  column  r^j , r2 j , 
rLj  we  get 


r-j  = (rXj  M:  Nx  + r2j  M2  N2  + ...  + rLj  ML  NL)  modm 

where  = m/m^,  i = 1,  2,  ...,  L and  Nj  Mj  s 1 modm^. 

Since  r-y  is  a root  of  xN  + 1 = 0 modm^,  i = 1,  . ..,  L (it 

satisfies  r-jj^  = -1  modm^),  then  rj  is  a root  of  x^  + 1 = 0 modm  or 

it  satisfies  rjN  e -1  modm.  The  other  N - 1 roots  of  xN  + 1 = 0 
modm  are  found  by  applying  the  Chinese  Remainder  Theorem  on  the 
elements  of  the  rest  of  the  columns  of  matrix  A.  Then,  x^  + 1 = 0 
modm  has  N roots  rj_,  r2,  ...,  r^  obtained  by  the  Chinese  Remainder 
Theorem  as  described  above.  Besides  this  the  roots  rj_,  ...,  r^  are 
all  distinct  modulo  p^  for  every  i = 1,  . . . , L,  and  by  using  Theorem 
4.8  xN  + 1 can  be  written  as 

x^  + 1 s (x  — r^)  (x  - r2)  ...  (x  - r^)  p(x)  modm 

Since  p(x)  = 1,  we  find  xN  + 1 s (x  - r^)  (x  - r2)  ...  (x  - rN) 


modm. 
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To  prove  that  such  a factorization  is  not  unique  consider  all 
the  possible  permutations  of  rows  2,  3,  . ..,  L of  matrix  A.  Since 
each  row  contains  N elements  that  are  permuted  and  L - 1 rows  are 
simultaneously  permuted  the  number  of  all  the  possible  matrices  A is 
(N!)l~1.  Applying  the  Chinese  Remainder  Theorem  for  the  elements  of 
every  column  of  each  one  of  the  (N!)L_1  possible  A matrices  we 
obtain  all  (N!)^-!  possible  factorizations  of  + 1 modm. 

The  following  example  will  make  this  procedure  clear. 

Example  4.1:  Factorize  x2  + 1 mod  1105. 

Here,  the  order  of  the  polynomial  is  N = 2 and  the  modulus  is  m 

= 1105  = 5^  • 13l  • 17l  = p-^1  • p2^  • P3^  = mj  • m2  • m3.  Observe 

that 

Pl-1  5-1  4 p2-l  13-1  12  p3-l  17-1  16 

2 2 2 ’ 2 2 2 ' 2 2 2 
and  also  N|(pj-l)/2  for  i = 1,  2,  3.  Then  Theorem  4.1  implies  that 

O 

x + 1 must  be  factorized  in  two  distinct  factors  modulo  1105  in 
(N!)^-!  = (2!)3~1  = 4 different  ways.  For  m^  = p^  = 5 the 

congruence  x2  + 1 2 0 mod  5 has  two  solutions  r^  = 2 and  r^  = 3 

(because  22  + 1 s 0 mod  5,  32  + 1 = 0 mod  5).  For  m2  = P2  = 13  the 

congruence  x2  + 1 = 0 mod  13  has  the  solutions  r2i  = 5 and  r22  = 8, 
while  for  m3  = P3  = 17  x2  + 1 2 0 mod  17  has  ^3  = 4 and  r32  = 13  as 
its  solutions. 

Consider  the  matrix  A^ 
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rll 

r12 

r21 

r22 

r31 

r32 

By  taking  all  possible  permutations  of  rows  2 and  3 of  matrix  A3  and 
leaving  row  1 untouched  we  get  the  following  four  matrices 
(including  A3) : 


/ \ 

( ' 

/ \ 

e 

rll  r12 

rU  r12 

rll  r12 

rll  r12 

r21  r22 

II 

CM 

< 

r22  r21 

, a3  = 

r21  r22 

> 

II 

r22  r21 

r31  r32 

r31  r32 

r32  r31 

r32  r31 

> 

L 4 

Applying  the  Chinese  Remainder  Theorem  on  the  columns  of 
matrices  A3,  A2,  A3,  A4  we  can  get  the  four  different  sets  of 
solutions  of  + 1 h 0 mod  1105. 

Matrix  A3  gives 

rl  = (rll  M3  M3  + r21  ^2  ^2  + r31  M3  N3)  modm 

Here  m = 1105,  M3  = m/m^  = 1105/5  = 221  and  N3  is  such  that  N3  • M3 
= 1 mod  m3  = 1 mod  5 =*  N3  = 1.  Similarly  M2  = m/m2  = 1105/13  = 85 
and  N2  is  such  that  N2  • M2  = 1 mod  13,  and  N2  = 2,  while  M3  = 65 
and  N3  = 11.  Since  (r^,  ^3,  ^3)  = (2,  5,  4)  it  becomes  clear 
that 


rj  = (2  • 221  • 1 + 5 • 85  • 2 + 4 • 65  • 11)  mod  1105  or  r^  = 837 
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while  r2  = ( n ^ 2 ■ M3  • N3  + ^21.  ' M2  • N2  + r^2  * 

for  (r12»  r22»  r32)  = (3>  8>  13)  we  get 

r2  = (3  • 221  • 1 + 8 • 85  • 2 + 13  • 65  • 11)  mod 

So  one  factorization  of  x2  + 1 mod  1105  is  x2  + 1 
268)  mod  1105.  Note  that  indeed  r^3  + 1 = 8372  + ] 
r2^  + 1 = 2682  +1=0  mod  1105. 

Matrix  A2  gives 

r3  = (rll  ’ Mi  • N3  + r22  • M2  * N2  + ^3  • M3 

and 

r4  = (r^2  • M3  • N3  + ^3  • M2  • N2  + r32  * M3 

for  which  r32  + 1 = 2422  + 1=0  mod  1105  and  r 4^  h 
mod  1105  while  matrix  A3  gives 

t"5  = (rH  • M3  • N3  + r2j_  • M2  * N2  + r32  • M3 

r6  = (r12  * M^  • N3  + r22  • M2  • N2  + ^3  • M3 

with  r53  + 1 = 6422  + 1=0  mod  1105  and  rg2  + 1 = 
1105.  Matrix  A4  results  in 

r7  = (rll  • M3  • Nj  + r22  • M2  • N2  + r32  • M3 

r8  = (r12  * M3  • N3  + r23  • M2  * N2  + r33  • M3 

for  which  ry^  + 1 = 472  +1=0  mod  1105  and  rg2  + 


M3  • N3)  modm  and 

1105  or  r2  = 268 

= (x  - 837)  (x  - 
= 0 mod  1105  and 

• N3)  modm  = 242 

• N3)  modm  = 863 

1 = 8632  +1=0 

• N3)  modm  = 642 

• N3)  modm  = 463 

4632  + 1=0  mod 

• N3)  modm  = 47 

• N3)  modm  = 1058 

1 = 10582  +1=0 


mod  1105. 
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So  the  four  different  solutions  of  x2  + 1 = 0 mod  1105  are  {837, 
268},  {242,  863},  {642,  463},  {47,  1058}  and  x2  + 1 can  be  factored 
mod  1105  as 

x2  + 1 = (x  - 837)  (x  - 268)  mod  1105 
or 

x2  + 1 = (x  - 242)  (x  - 863)  mod  1105 
or 

x2  + 1 = (x  — 642)  (x  - 463)  mod  1105 
or 

x2+l  = ( x — 47)  (x  - 1058)  mod  1105 

Having  proven  the  general  theorems  on  the  modulus  choice  m such 
that  x^  +1=0  modm  can  have  N distinct  roots,  it  is  a trivial 
matter  to  prove  the  theorems  3.1  and  3.2  which  provide  the  modulus 
choice  for  the  QRNS  case. 

Proof  of  Theorem  3.1;  It  is  desired  to  prove  that  if  m = p,  prime 
number,  then  the  necessary  and  sufficient  condition  that  the 
congruence  x2  s -1  modp  has  two  distinct  solutions  in  Zp  is  that  p = 
4k  + 1.  Theorem  4.3  implies  that  the  congruence  x^  = -1  modp  for 
prime  p has  N distinct  solutions  if  and  only  if  N|(p-l)/2.  For  N = 
2 it  must  be  2 1 ( p— 1 ) / 2 » (p-l)/2  = 2k  » p = 4k  + 1 where  k is  a 
positive  integer.  This  completes  the  proof. 

Proof  of  Theorem  3.2:  Here  it  is  needed  to  prove  that  if  the  prime 
decomposition  of  m is  m = p^el  • P2e2  ...  PLeL,  with  p^  prime,  i = 
1,  2,  ..,  L,  then  the  necessary  and  sufficient  condition  that  the 
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congruence  x2  = -1  modm  has  two  distinct  solutions  in  Zm  is  that  p^ 
= 4kj  + 1,  i = 1,  2,  . . . , L. 

Theorem  4.1  implies  that  the  congruence  x^  h -1  modm  has  N 
distinct  solutions  in  Zm  if  and  only  if  N|(pj-l)/2  for  every  i = 1, 
2,  L.  For  N = 2 it  must  be  2|(p^-l)/2  « (pj-l)/2  = 2k^  « pi  = 
4k^  + 1 with  kj  a positive  integer  and  the  proof  is  completed. 

As  it  will  be  shown  in  later  chapters,  special  attention  will  be 
given  to  Nth-order  PRNS  systems  with  N being  a power  of  two.  The 
following  colloraries  provide  information  about  the  modulus  choice 
in  this  case. 

Corollary  4.1:  If  m = p,  prime  number  and  N = 2s,  s = positive 
integer,  then  the  necessary  and  sufficient  condition  for  the 
congruence  x^  h -1  modp  to  have  N distinct  solutions  is  that  p = 
2S+1  k.  + 1 , k = positive  integer. 

Proof:  Theorem  4.3  implies  that  the  congruence  x^  h -1  modp  for 
prime  p has  N distinct  solutions  if  and  only  if  N|(p-l)/2.  For  N = 
2s  it  must  be  2s|(p-l)/2  « (p-l)/2  = 2s  • k « p = 2S+1  k + 1,  where 
k is  a positive  integer. 

Corollary  4.2:  If  the  prime  decomposition  of  m is  m = pjel  • P2e2 
...  PLeL,  p^  prime,  i = 1,  2,  ...,  L and  N = 2s,  s positive  integer, 
then  the  necessary  and  sufficient  condition  for  the  congruence  x^1  = 
-1  modm  to  have  N distinct  solutions  in  Zm  is  that  p^  = 2S+1  k}  + 1, 
ki  positive  integer,  i = 1,  2,  ...,  L. 
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Proof:  Theorem  4.1  implies  that  the  congruence  x^  s -1  modm  has  N 
distinct  solutions  in  Zm  if  and  only  if  N|(p^-l)/2  for  every  i = 1, 
2,  L.  For  N = 2s  it  must  be  2s|(p^-l)/2  o (p^-l)/2  = 2s  kj  * 
Pi  = 2S+1  ki  + 1,  ki  positive  integer,  i = 1,  L. 

Example  4.2:  Factorize  x4  + 1 mod  17. 

Here,  the  order  of  the  polynomial  is  N = 4 = 22  and  the  prime 
modulus  is  p = 17  = 8.2  + 1 = 22+l  *2  + 1.  Corollary  4.1  implies 
that  the  congruence  x4  + 1 s 0 mod  17  has  four  distinct  solutions  in 
Zi7 . These  are  rj  = 2,  r2  = 15,  r3  = 8,  = 9 and  x4  + 1 s (x  - 2) 
(x  - 15)  (x  - 8)  (x  - 9)  mod  17.  To  check  the  results,  observe  that 
(x  - 2)  (x  - 15)  (x  - 8)  (x  - 9)  mod  17  = (x2  - 2x  - 15x  + 30)  (x2  - 
8x  - 9x  + 72)  mod  17  = (x2  - 17x  + 30)  (x2  - 17x  + 72)  mod  17  = (x2 
+ 13)  (x2  + 4)  mod  17  = (x4  + 17x2  + 52)  mod  17  = x4  + 1. 

The  following  theorems  prove  the  existence  of  and  rj“j  modm, 
i = 0,  ...,  N - 1 and  j = 1,  ...,  N - 1 which  are  required  for  the 
computation  of  ffj-^  and  they  also  provide  some  necessary  background 
needed  for  section  4.3. 

Theorem  4.12:  If  x^  + 1 can  be  factored  in  N distinct  factors  modm 
as  x^  + 1 = (x  - rg)  (x  - ri)  ...  (x  - r^_i)  modm,  then  the  inverses 
of  the  roots  rj_J  exist  for  every  i = 0,  ...,  N - 1 and  j = 1,  ..., 
N - 1. 

Proof:  Since  rj[  is  a root  of  x^  = -1  modm  then  r^N  = -1  modm  or 
j^N-l  . (_rj)  = 1 modm.  This  means  that  r^N_l  and  (-r^)  are 
multiplicative  inverses  to  each  other  or  rj-(N-l)  s -rj  modm. 
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Similarly,  for  j = 1,  N - 1,  r^N  s -1  modm  or  r^N“j  • (-r^)  = 

1 modm  and 

r.-(N-j)  = _r^j  mocjm  i = 0,...,N-1,  j = 1 , . . . , N-l  (A. 13) 

The  existence  of  ri-(N-j)  follows  from  the  existence  of  r^J  and  as  a 
result  all  the  inverses  exist.  Notice  that  equation  (4.13)  can 
equivalently  be  written  as 

r^k  = -riN“k,  i = 0,  1,  ...,  N - 1 and  k = 1,  ...,  N - 1 (4.13a) 

Theorem  4.13:  If  a and  b are  integers  and  (a,  b)  = 1,  then  there 

exist  integers  r and  s such  that  ra  + sb  = 1,  where  (a,  b)  denotes 
the  greatest  common  divisor  of  a and  b. 

Proof:  It  is  given  by  Stark  [Sta70a]. 

Theorem  4.14:  If  m is  an  odd  number  with  prime  factorization  m = 

Plel  * P2e2  •••  PLeL>  Pj  prime  and  N < pj  for  i = 1,  ...,  L,  then 
N--*-  exists  modulo  m. 

Proof:  Since  N < Pi  then  pi  cannot  be  a factor  of  N.  Therefore,  N 

and  m are  relatively  prime  or  (N,  m)  = 1.  But  Theorem  4.13  implies 
that  there  exist  integers  r and  s such  that  r*N+s*m=lorr' 
N = 1 - s • m.  Taking  both  sides  of  this  equation  modm  we  get  (r  • 
N)modm  s (1  - s • m)modm  = 1,  or 

(r  • N)modm  s 1 (4.14) 

From  equation  (4.14)  it  follows  that  N--*-  = r modm,  and  N--*-  exists 


modm . 
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Theorem  4.15:  If  N,  m are  positive  integers  and  the  prime 
factorization  of  m is  m = p^el  • P2e2  ...  PgeL  with  N < p^,  p^  prime 
and  if  N|(pjc-l)/2  for  k = 1,  2,  ...,  L. 


N-l 

Z r^j  s 0 modm  j = 1,  ...,  N - 1 (4.15) 

i=0 


Proof:  By  hypothesis  xN  + 1 3 (x  - tq)  (x  - r^)  ...  (x  - rN_^)modm. 

By  multiplying  out  the  second  part  of  the  above  equation  we  obtain 

xN  + 1 3 x^  — (Er^)x^--*-  + (Ir^rj)x^-2  _ ( £r  jr  j rjc)x^-^  + ...  + 

(-1)^  tq  • r^  ...  rpj_i  modm 

Therefore 

N-l 

Zr^  3 0,  Zr^r-;  h 0,  Zr^r-jr^  3 0,  . ..,  II  = (-1)N  (4.16) 

i=0 

In  plain  English  this  means  that  the  sum  of  all  roots  is  zero,  the 
sum  of  products  of  the  roots  taken  two  at  a time  is  zero,  the  sum  of 
products  of  the  roots  taken  three  at  a time  is  zero  . . . and  the  sum 
of  products  of  the  roots  taken  N - 1 at  a time  is  zero,  while  the 
product  of  all  roots  is  (-1)^. 

Proving  equation  (4.15)  for  j = 1 is  simple  since  equation 
N-l 

(4.16)  states  that  Z r^  3 0.  The  proof  for  j = N - 1 is  given 

i=0 

next.  By  equation  (4.16)  the  sum  of  products  of  roots  taken  N-l 
at  a time  is  zero,  or 


(Z  II  (all  roots  but  r^))  3 0 (4.17) 

N-l 

Since  n r^  = (-1)N  then  (II  all  roots  but  r^)  3 (-l)N  r^-*  and 
i=0 

equation  (4.17)  gives 
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N-l 

2 rk~!  = 0 (4.18) 

k=0 

or  using  equation  (4.13a) 

N-l 

E r^-1  = 0 (4.19) 

k=0 

To  prove  that  rg^  + r ^ + . . . + r^_^^  = 0 modm  observe  that 


(r0  + rl  + • • • + rN-l)2  - r02  + rl2  + •••  + rN-l2  + (4.20) 


N-l 

2 E rjr-j 
i»  j=0 


Since 


(N-112 


N-l 

0 and  E riri 
i,j=0 


= 0, 


the  result  of  equation  (4.20)  is 


N-l 

E r = o modm  (4.21) 

i=0 

To  prove  equation  (4.15)  for  j = N - 2 we  consider  the  fact  that 
the  sum  of  the  products  of  roots  taken  N - 2 at  a time  is  zero,  or 

(E  II  all  roots  but  r^  • r^  ) s 0 (4.22) 

1 2 


N-l 

which  by  the  fact  that  II  = (-1)N  becomes 

i=0 


N-l 

E rk  ’ rk 
k1,k2=0  1 2 

kl*k2 


= 0 


-1 


(4.23) 
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or  by  (4.13a) 


N-l 

E rk  N'1  • rk  N"1  s 0 (4.24) 

ki,ko=0  1 2 

ki*k2 


But 


(ro^1  + riN-l  + ...  + ru-r*-1)2  - ^-2  + ...  + ^^-2 


N-l 

E 

i»  j=0 


N-l  N-l 

and  since  E rk^~^  = 0 (equation  (4.19))  and  E rj^-1  • r-jN-1  = 0 
k=0  i , j =0 


N-l 

(equation  (4.24)),  then  E r^2N-2  = q. 

1=0 

Observe  that  r^2N-2  = r^2^  • r^~2  = (r£N)2  . r^-2  = (-1)2  • r^_2 
modm,  so 


E r^  2 = o modm 


or  by  equation  (4.13a) 

N-l 

E r^N~2  = o modm  (4.25) 

i=0 

and  the  proof  for  j = N-2  is  completed. 

Considering  the  case  j = 3, 


(r0  + ri  + •••  + rN-1)3  = r03  + rj3  + . . . + rN-13  + 


(4.26) 
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N-l  N-l 

3 Z + 6 Z r-s  • r-s  • ri. 

i * j=0  J i , j , k=0  J 

i * j i *j 

itk 


N-l  N-l 

From  equation  (4.16  ) Z rj  5 0 modm  and  Z riri  - 0 modm  and 

i=0  i > j =0 

i*j 


N-l 

multiplying  both,  Z r^  • r-;  = 0 modm  is  obtained. 

i > j=0 


Since 


(N-l 
Z ri 

U=o 


N-l 


= 0 and  E r^r-jr^  h 0,  i#j/k,  i * k 
i, j ,k=0 


equation  (4.26)  is  converted  to 


N-l 

Z r-j^  = o modm  (4.27) 

i=0 

which  proves  equation  (4.15)  for  j = 3. 

A similar  proof  can  be  given  for  equation  (4.15)  for  any  j = 1, 
...,  N-l. 


Example  4.3:  Factorize  x^  + 1 over  Z ^ as  x^  + 1 = (x  - rg)  (x  - 

r^)  (x  - t2)  (x  - r3)  (x  - r^)  and  compute 


' 4 

Z 

li=0 


mod  11 


for  j 


1,  2,  3,  4 


The  modulus  is  the  prime  number  11  and  5 | (ll-l)/2  = 5.  Then 
x^  + 1 can  be  factored  in  five  distinct  factors  modll  and,  since  11 
is  prime,  the  factorization  is  unique. 
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The  five  roots  of  x3  + 1 h 0 modll  are  rg  = 2,  r^  = 6,  r2  = 7, 
r3  = 8,  r^  = 10  (r^  = _l  modll  for  i = 0,  . ..,  4). 


4 4 

E ri  =2+6+7+8+10=  33  =0  modll,  Z r = 22  + 62  + 72  + 
i=0  i=0 


82  + 102  = 4 + 36  + 49  + 64  + 100  = 4 + 3 + 5 + 9 + 1 = 22  = 0 modll 


4 4 

E ri3  s 8 + 18  + 35  + 72  + 10  a 8 + 7 + 2 + 6 + 10  s 33  s 0modll,E 
i=0  i=0 


rj4  = 16  + 42  + 14  + 48  + 100  h 5 + 9 + 3 + 4 + 1 h 22  s 0 modll. 


Then 


4 

E r^  J 

U=0 


— 0 modll  for  j 
with  Theorem  4.15. 


1,  2,  3,  4,  which  is  in  agreement 


Theorem  4.16:  For  the  same  hypothesis  as  in  Theorem  4.15 


N-l 

E r^~3  h 0 modm,  j = 1,  ...,  N - 1 
i=0 


Proof:  Theorem  4.12  implies  that  J : -r^-3-  modm  and  by  Theorem 

4.15 


N-l  _ N-l 

E fj-3  = - E rj^-3  modm  = 0 q.e.d. 
i=0  i=0 


4.3  Theoretical  Study  of  the  Nth  Order  PRNS  Mapping 
Based  on  the  Factorization  of  x^  + 1 in  Z ^ 

The  validity  of  the  forward  and  inverse  isomorphic  mappings  f^, 
f^“3  as  well  as  the  simplified  rules  of  PRNS  as  given  by  equations 
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(4.7)  through  (4.12)  can  be  proven  now.  The  following  theorems 
facilitate  the  study  of  the  PRNS  mapping  and  the  rules  of 

composition. 

Theorem  4.17:  If  the  polynomial  + 1 can  be  factorized  in  N 

distinct  factors  in  Zm  as  x^  + 1 s (x  - rg)  (x  - r^)  ...  (x  - 

rN_l)modm,  then  the  PRNS  mapping  fN  from  P(m)  onto  Zm  x Zm  x . . . x 

Zm  - ZmN  defined  by  equation  (4.7)  is  an  isomorphism. 

Proof:  For  f^  to  be  an  isomorphism  it  must  satisify 

(i)  fjq  must  be  one-to-one  and  onto  [Her75a] 

(ii)  for  A,  B e P(m)  it  must  be 

fN  (A  + B)  = fjj  (A)  + f^j  (B) 

fN  (A  • B)  = fN  (A)  • fN  (B)  [Her75a] . 

Proof  of  (i):  The  number  of  elements  in  P(m)  is  m^  and  so  is  the 
number  of  elements  in  Zm  x Zm  x . . . x Zm  £ Zm^.  It  must  now  be 
shown  that  for  A,  B e P(m) 

A t B =>  fN  (A)  t fN  (B)  (4.28) 

Suppose  that  for  A = ag  + aj_  x + ...  + aN_i  xN~l,  B = bg  + b^x  + 
...  + bfq_^x^--*-  with  a^,  bj  E Zm  for  i = 0,  ...,  N - 1 

A * B » fN  (A)  = fN  (B)  (4.29) 


But 
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A # B «-  a0  / bg  or  a!  4 b]^  or  ...  or  aN-1  t bN_  j (4.30) 

° 3q  - bg  4 0 modm  or  aj  - bj  * 0 modm  or  ...  or 

aN-l  _ ^ 4 0 modm  o aQ  - bQ  = k.Q  m + Xq  and  aj  - 

bi  = k.j_  m + and  ...  and  aN_^  - bN_j  = kN_j  m + XN_1 

with  k0,  k]_,  ...,  kN_2  integers  and  Xq,  X1?  ...,  XN_1  e {0,  1,  . .., 

m - 1)  and  at  least  one  of  Xq,  ...»  X^i  is  not  zero. 

For  fjyj  (A)  and  (B),  equation  (4.7)  gives 

fN  (A)  = (a0*,  a!*,  ...,  aN-1*)  = (A  (x)  mod  (x  - r0),  . .., 

A (x)  mod  (x  - rN_j)) 

= (A(tq)  , A(r^),  . . . | A(  r^j_  j ) ) 
or 

fN  (A)  = (a0  + aj  r0  + ...  + aN-1  r0N_1,  a0  + aj  rj  + ... 

+ aN-l  rlN_1>  •••  ^ao+a!  rN-1  + ...  + aN_!  rN_1N-1) 
Similarly, 

fN  (®)  = (b0  + bj_  r0  +...+  bN_!  r0N_1,  b0  + bj  r1  +...+  bN-1  r^"1, 

b0  + bl  rN-l  + •••  + bN_!  rN_1N_1). 

For  f^  (A)  = f^j  (B)  it  must  be 

(a0  - bo)  + (ai  - bj)rQ  + ...  + ( ajq_ ^ - b!j_!)roN-^  = 0 modm  (4.32) 

(a0  - bQ)  + (ai  - bj.)rj_  + ...  + (a^_i  - b^^r^-l  = 0 modm 

(a0  - bQ)  + (aj  - b!)rjq_!  + ...  + ( ajsj_2  - bN-l)rN-l^-^  5 0 modm 

Substituting  for  (a^  - b^)  in  equation  (4.32)  from  equation 

(4.30)  for  every  i = 0,  . . . , N - 1 we  get 
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(kg  m + Xq)  + (k^  m + X^)rQ  + ...  + (kj*]_i  m + X|\j_^)rQ^  ^ = 0 modm 

(kQ  m + Xq)  + (k^  m + X^)r^  + ...  + (k^_j  m + Xpj_i)r]_N-l  = 0 modm 

• • . 

* 

(k0  m + Xq)  + (k^  m + X^)r^_^  +...+  (k^_i  m + Xj^_ ^ ^ = 0 modm 
or 

(k0  + k^Q  + k2r02  + ...  + kN_1r0N-1)m  + (Xq  + X^q  + X2rQ2  + ... 

+ Xjq_i  tq^-!)  = 0 modm 

(kQ  + k^  + k2r^2  + ...  + kN_iriN-1)m  + (Xq  + X^  + X2rj2  + ... 

+ riN-'*')  = 0 modm 


(k0  + k1rN_1  + k2  rN-12 


or 

Xq  + X^q  + X2r02  + 
Xq  + X^rj.  + X2r^2  + 


+ ..•  + kN-lrN-lN  1)m  + (\)  + MrN-l  + ^2 

fN-1^  + •••  + ^N-lrN-l^~'*')  = 0 modm 

•••  + ^N_lr0N~'*'  5 0 modm  (4.33) 

• • • + = 0 modm 


Xq  + X^r^-l  + ^2rN-l^  + • • • + ^N_lrN_iN  ^ = 0 modm 
Since  x^  + 1 = (x  - tq)  (x  - r^)  ...  (x  - rN_^)modm,  then  from 


Theorem  4.15 
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N-l  N-l  N-l 

E rj  ; 0 modm,  E r^2  s 0 modm,  E r^-l  = 0 modm  (4.34) 

i=0  i=0  i=0 

Adding  all  N congruences  in  (4.33)  we  get 


N-l  N-l 

N-Xq  + Ai'  Erj  + A2*  E + . 
i=0  i=0 

and  since  equation  (4.34)  holds  true, 


• + *N-1 
equation 


(4.35) 

N-l 

• E r^-!  = 0 modm 
i=0 

(4.35)  reduces  to 


N • Aq  s 0 modm  (4.36) 

But  Theorem  4.1  implies  that  0<N<m,  so  N £ 0 modm  and 
equation  (4.36)  means  that  Aq  = 0 modm.  The  system  of  congruences 
(4.33)  now  becomes 

r0  + ^2  r0  + + ^N-l  r0N'2)  H 0 modm  (4.37) 

r^  (A^  + A2  rj  + . . . + A^_i  r^-2)  s 0 modm 

rN-l  (M  + ^2  rN-l  + •••  + Afj_i  rN-lN-2)  E 0 modm 

Since  r^  £ 0 modm  and  rj  e Zm  for  every  i = 0,  ...,  N - 1,  equation 
(4.37)  reduces  to 

xi  + A2  rQ  + ...  + Ajsj_ ^ ro^-2  = 0 modm  (4.38) 

X1  + A2  ri  + ...  + Aj^_2  r^-2  s 0 modm 


xi  + A2  rN_x  + 


+ rN_!N  2 


= 0 modm 


where  the  summation  of  all  the  above  congruences  gives  A^  s 0 modm 
(equation  (4.34),  N £ 0 modm).  Repeating  the  above  procedure,  we 


obtain 
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Xq  = Xi  s . . . = X^i  = 0 modm  (A. 39) 

But  since,  as  stated  before,  at  least  one  of  Ag,  X^,  ...,  Af^ 
must  be  non  zero,  equation  (4.39)  creates  a contradiction.  The 
contradiction  was  a result  of  the  wrong  assumption  of  equation 
(4.29),  and  therefore  fN  (A)  * fN  (B)  and  the  proof  of  (i)  is  now 
completed . 

Proof  of  (ii):  For  A,  B e P(m)  chosen  as  before,  fN  (A)  and  f^  (B) 

are  given  by  equation  (4.31).  Then 

fN  (A)  + fN  (B)  = ((ag  + bg)  modm  + (a^  + b^)rg  modm  + ...  (4.40) 

+ (aN-l  + *3N-l)r0N~1  modm,  ...,  (ag  + bg) 
modm  + (a^  + b]i)rjq_j_  modm  + ...  + (a^_^ 

+ bN_l)rN_iN_1  modm) 

A + B = (ag  + bg)  modm  + (a^  + b^)  x modm  + ...  + (a^_j  + (4.41) 

bN-l)  modm 

and  by  equation  (4.7) 

fN  (A  + B)  = ((A  + B)  (rg),  (A  + B)  (rx),  ...,  (A  + B)  (rN-1)) 
or 

fN  (A  + B)  = ((ag  + bg)  modm  + (a^  + b^)  rg  modm  + ...  + (4.42) 

(aN-l  + t>N-l)  rgN-1  modm,  ...,  (ag  + bg)  modm  + 

(al  + b^)r^_2  modm  + ...  + (a^_i  + bjsj_^)  rjj_j^“*  m°dm) 
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Comparing  equations  (4.40)  and  (4.42)  we  find  that  fN  (A  + B)  = fN 
(A)  + fN  (B). 

For  the  multiplication 

A • B = (ag  + aj  x + ...  + a^_^  x^--*-)  (bg  + bj  x + ...  + 
t*N-l  xN-1)  mod  (xN  + 1) 

and 

fN  (A  • B)  = (AB  (rg),  AB  (r^,  ...,  AB  (rN_x))  (4.43) 


[(ag  + a-i  rg  + ...  + aN-1  rgN  *)  (bg  + bj  rg  + ...  + bN-1  rgN  1) 


mod  (rgN  + 1)]  modm,  . ..,  (ag  + a1  rN-1  + ...  + aN-1  r^jN-1) 


(bg  + b1  rN-1  + ...  + bN_x  rN_xN  x)  mod  (rN-1N  + 1) 


modm 


In  equation  (4.43)  mod  (rg^  + 1)  has  the  meaning  of  substituting  rg^ 
by  -1,  exactly  as  the  operation  mod  (xN  + 1)  means  substituting  xN 
by  -1  in  the  product  A • B.  Evaluating  fN  (A)  • fN  (B)  and  taking 
into  account  that  r^N  = -1  modm,  i = 0,  ...,  N - 1,  it  turns  out 
that  fN  (A  • B)  = fN  (A)  • fN  (B)  and  (ii)  is  proven.  So  fN  is  an 
isomorphism  from  P(m)  onto  Zm^. 


Theorem  4.18:  The  mapping  f^1  : (ag*,  ax*,  ...,  aN-1*)  fN_*  A(x), 

N-l  * m > 

with  A(x)  = ( E a^  • Q^(x))  mod  (xN  + 1)  and  Q^(x)  given  by 

i=0 

equation  (4.9),  is  the  valid  inverse  mapping  of  fN  described  by 
equation  (4.7). 
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Proof:  Consider  A s P(m)  with  A = a0  + aj  x + ...  + aN_j  xN_1. 

Then,  by  equation  (4.7),  fN  (A)  = (a0*,  aj*,  ...,  aN_!*)  with 

ai*  = (aN-l  riN~3  + aN_2  rjN_2  + ...  + r^  + ag)  modm  (4.44) 

i = 0,  1,  ...,  N - 1 


From  equation  (4.9) 

3i*  Qi(x)  = N-l  • (rr(N-D  XN-1  + r.-(N-2)  xN-2  + 


(4.45) 
+ r^-2  x2  + 


ri  1 x + 1)  • (a^!  rAN  1 + aN_2  * r^-2  + ...  + 

O 

a2  ri  + al  ri  + ao)  modm 

= N-1  (aN-1  • xN_1  + aN-1  ri  xN~2  + aN-1  rj2  xN_3  + ...  + 


aN-l  riN"2  x + aN-l  riN_1  + 


aN-2ri  lxN  1 + aN-2xN  2 + aN-2rixN"3  + • • • + aN_2riN"3x  + a^r^"2 
• • 

a;r.-(N-2)xN-l  +'air.-(N-3)xN-2  + +...+  ai  x + a^  + 


a0rr(N-l)  XN-1  + agrr<N-2)  xN-2  + 


+ a0ri  1 x + a0) 


N-l 

By  combining  Theorem  4.15,  Theorem  4.16  (i.e.  Z r^iJ  = 0 modm 

i=0 

j = 1,  . N - 1)  and  (4.45),  the  expression  for  A(x)  becomes 
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U-l 

A(x)  = L ai* 

vi=0 


(4.46) 

\ 

Qi(x)  mod(xN  + 1)  = N-1  (NaN_x  x^1  + ...+  Naxx  + Na0) 


The  proof  is  completed  since  it  has  been  shown  that  fN-1  (fN 


Theorem  4.19:  The  polynomials  Qi(x)  of  equation  (4.9)  also  satisfy 

Qj (x)  = 6^  mod  (x  - rj)  i,  j = 0,  ...,  N - 1. 

Proof:  Recall  that  xN  + 1 = (x  - r0)  (x  - rj)  ...  (x  - rN_i)  modm. 

Define  the  polynomial  P^x)  as  P^x)  = xN  + 1 / x - ri.  Then,  from 
the  factorization  of  x^  + 1, 

pi(x)  = (x  - r0)(x  - r1)...(x  - ri_1)(x  - ri+1)...(x  - rN_x)  (4.47) 
Division  of  x^  + 1 by  x - r^  through  long  division  gives 


(A))  = A. 


+ ...  + r 


.N-2 

l 


(4.48) 


Pi(x)  mod  (x  - rj)  = P^rj)  = Nr^"1 


Define  Ri  as 


-1 


Ri  = (N  • r^"1)  = N-1  rj-CN"1)  modm 


(4.49) 


Theorems  4.12  and  4.14  imply  that  r^N"1)  and  N-1  exist  modm  for  i 
= 0,  ...,  N - 1.  Using  equations  (4.48)  and  (4.49)  for  Pi(x)  and 
Ri,  the  product  Ri  Pi(x)  becomes 
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Ri  • Pi(x)  = N-1ri"(N-1)  (xN-1 


+ r^x 


x + r 


iN_1  (4.50) 


= N-l(ri-(N-DxN-l 


-(N-2)xN-2 


-1x  + 1)  = Qi(x) 


and  Rj  • P^Cx)  = Qi(x).  We  need  to  prove  that  Qi(x)  = 6^  mod  (x  - 
r j ) • This  is  equivalent  to  proving  that  Qj[(x)  mod  (x  - r^)  = 1 modm 
and  Qi(x)  mod  (x  - r j ) =0  modm  (i  / j). 

Qi(x)  mod  (x  - r^  = [Ri  Pi(x)]  mod  (x  - ri)  = Ri  Pi(ri)  = 


and  Qi(x)  mod  (x  - ri)  = 1 modm  has  been  proven. 

To  prove  that  Qi(x)  mod  (x  - rj)  =0  modm  consider  the 
following:  for  j / i,  rj  is  a root  of  Pi(x)  since,  by  equation 

(4.47),  Pi(x)  has  as  roots  all  the  roots  of  xN  + 1 besides  ri. 
Since  rj  is  a root  of  Pi(x)  then  Pi(rj)  = 0 modm  or,  using  equation 


(4.48), 


Pi(rj)  = rjN"1 


+ r-i  r 


.N-2 


+ . . . + r 


i^  ^rj  + r^  1 = 0 modm 


Multiplying  both  sides  of  the  above  by  -ri  we  get 


riN  ^ rj  “ riN  = 0 modm  (4.51) 


Qi(x)  mod  (x  - rj)  = [R±  • Pi(x)]  mod  (x  - r j ) = Ri  • Pi(rj)  = 
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Since  s 


(by  Theorem  4.12,  equation  (4.13a),  and 


equation  (4.51)), 


Qi(x)  mod  (x  - r j ) h N-1  (-  rjN-^  - r^2  rjN-2  - 


(4.52) 


riN  * rj  “ riN)  H 0 modm 


Then  Q^(x)  mod  (x  - r^)  h 1 modm,  Q^(x)  mod  (x  - r j ) = 0 modm  or 
Qj(x)  = mod  (x  - r j ) q.e.d. 

The  simplified  rules  of  composition  in  the  PRNS,  given  by 
equations  (4.11)  and  (4.12),  are  justified  as  the  Theorems  4.17  and 
4.18  indicate. 

Example  4.4:  This  example  will  demonstrate  that  the  QRNS  given  by 
equations  (3.1)  and  (3.2)  is  indeed  a PRNS  of  order  2.  The  example 
will  also  demonstrate  the  validity  of  Theorem  4.19. 

As  it  was  already  discussed  at  the  beginning  of  section  4.1,  the 
product  in  Zm  of  two  complex  numbers  A = ag  + ja^  and  B = bg  + jb^ 
with  aQ,  aj,  bg,  b^  e Zm  can  be  mechanized  as  a product  of  two 
first-order  polynomials  A(x)  • B(x)  mod(x2  + 1)  with  arithmetic 


performed  modm,  where  A(x)  = ag  + a^x  and  B(x)  = bg  + b^  x,  ag,  a^, 
bg,  b:  £ Zm. 
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To  demonstrate  that  the  QRNS  is  a 2nd-order  PRNS,  consider  a 
first  order  polynomial  A(x)  = ag  + a^x  with  ag,  aj  e Zm  and  the 
factorization  of  (x2  + 1)  modm  as 

x2  + 1 = (x  - r0)  (x  - rx)  (4.53) 

It  was  found  that  the  two  roots  are  rg  = j and  r^  = -j . Using 
equation  (4.7)  we  get  for  the  forward  PRNS  mapping  for  N = 2 ag*  = 
A(x)mod(x  - rg)  = A(rg)  = A(j)  = ag  + ja^  and  a^*  = A(x)mod(x  - r^) 
= A(rj)  = A ( — j ) = ag  - jaj,  or 

ag*  = (ag  + ja!)  modm  (4.54) 

a^*  = (ag  - ja^)  modm 

Comparison  of  equations  (4.54)  and  (3.1)  shows  that  (4.54) 
reflects  the  QRNS  mapping. 

Equations  (4.8)  and  (4.9)  give  for  the  inverse  2nd-order  PRNS 
mapping 

A(x)  = ag*  Qg(x)  + a]*  Q;l(x)  (4.55) 

where  Qg(x)  = 2_1(1  + rg-1x)  = 2~1(1  + j_1x)  and  Q^x)  = 2~1(1  + 
r^x)  = 2— 1 ( 1 + (-J)"^)  = 2~1(1  + (-1)"1  J-1X)  = 2_1(  1 - l^x),  or 

Qg(x)  = 2~ 1 ( 1 + j-1x)  (4.56) 

Qi(x)  = 2"1(1  - j_1x) 
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If  A(x)  = ag  + a^x,  then  equations  (4.55)  and  (4.56)  give 

a0  = (2-1(ag*  + a^*) ) modm  (4.57) 

ai  = (2_-*-j_l(ao*  - a^*))  modm 

Since  equations  (4.57)  and  (3.2)  are  the  same,  then  (4.57) 
provides  the  inverse  QRNS  equations. 

To  demonstrate  the  validity  of  Theorem  4.19  (i.e.  the  fact  that 
Qi(x)  = &ijmod(x  - rj))  observe  that  Qg(x)mod(x  - rg)  = Qg(rg)  = 
Q()(j)  = 2~1(1  + j~l  • j)  = 2~^  (1  + 1)  = 2 “I  -2  = 1,  Qo(x)mod(x  - 

rl)  = Qo<rl>  = Qo(-j)  = 2~1(1  + J-1  • (-J))  = 2-1(l  - 1)  = 0, 

Q1(x)mod(x  - r0)  = Q1(r0)  = Qxd)  = 2~1(1  - J"1  • J)  = 2"1(1  - 1)  = 

0,  Q1(x)mod(x  - rx)  = Q1(r1)  = Q1(-J)  = 2-1  • (1  - j'-1  • (-"j))  = 

2-1( 1 + 1)  = 2"1  • 2 = 1,  or 

Qg(x)mod(x  - rg)  = 1 modm 


Qg(x)mod(x  - rj)  s 0 modm 


Ql(x)mod(x  - rg)  h 0 modm 


Ql(x)mod(x  - r^)  = 1 modm 
and  in  abbreviated  form 


Qi(x)  = &i jmod(x  - rj),  i,  j = 0,  1 


83 


In  addition  to  the  relationships  between  the  roots  of  + 1 = 0 

N-l 

modm  described  by  Theorems  4.15  and  4.16  (i.e.  T,  r<±3  = 0 modm,  i = 
N-l  N 1=0 

1,  . ..,  N - 1,  n r^  = (-1)^),  the  following  two  theorems  provide 
some  more  interesting  relationships: 


Theorem  4.20:  If  N is  even  and  r^  satisfies  the  congruence  x^  + 1 = 

0 modm,  then  the  additive  inverse  -r^  modm  of  r^  also  satisfies  xN  + 

1 s 0 modm. 

Proof:  Since  rj  satisfies  xN  + 1 = 0 modm,  then  rjN  = -1  modm.  But 
since  N is  even  then  (-r^N  = (-1)N  • rjN  modm  s rjN  modm  = -1 
modm.  So  (-r^)N  s -1  modm  and  -rj  satisfies  x^  + 1 s 0 modm. 

Theorem  4.21:  If  r^  satisfies  xN  + 1 = 0 modm,  then  the 

multiplicative  inverse  r^-1  modm  of  rj  also  satisfies  xN  + 1 = 0 
modm. 


Proof:  Since  rA  satisfies  xN  + 1 s 0 modm,  then  rjN  s -1  modm. 
Theorem  4.12  and  more  specifically  equation  (4.13a)  imply  that  r^--*- 
= -r^N-l  modm.  Therefore,  (r^--*-)^  s (-rjN-l)N  = ((-1)  • (r.j)N-l)N  = 
(-1)N  • ((ri)N-l)N  s (_1)N  . ((r.jNjN-l  H (_1}N  . (_i)N-l  a (_1}2N-1 
= -1  (because  2N-1  is  odd).  Then  (rj-!)^  = -1  modm  and  r^~l 
satisfies  x^  + 1 = 0 modm. 

Example  4.5:  Factorize  + 1 mod  41. 


Since  41  is  prime  and  4 1 41  — 1/2  = 20,  then  according  to  Theorem 
4.10  there  exists  a unique  factorization  of  x^  + 1 in  four  factors 
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x4  + 1 = (x  - rg)  (x  - r^)  (x  - ^)  (x  - ^).  The  numbers  rg,  rj, 
T2>  r3  are  the  only  numbers  that  satisfy  x4  + 1 = 0 mod  41. 
Theorems  4.20  and  4.21  imply  that  since  4 is  even  and  rg  satisfies 
x4  +1=0  mod  41,  then  -rg,  rg-3  and  -rg-3  should  also  satisfy  the 
same  congruence.  Therefore,  the  roots  are  rg,  -rg,  rg-1,  -rg--*-. 
Such  an  rg  is  rg  = 3 (because  rg4  s 34  = 81  s 40  = -1)  and  -rg  = 41 
- 3 = 38,  rg"1  = 14  (because  3 • 14  = 42  = 1 mod  41)  and  -rg-'*'  = 
27.  So  the  unique  factorization  of  x4  + 1 is 

x4  + 1 s (x  - 3)  (x  - 38)  (x  - 14)  (x  - 27) 

Example  4.6:  Factorize  x3  + 1 mod  91  and  demonstrate  the  validity 
of  Theorem  4.21. 

Here  the  modulus  is  m = 91  = 7 • 13  with  3 1 ( 7-1 ) /2  = 3 and 
3 1 ( 13—1  )/2  = 6.  So  by  Theorem  4.1  there  are  distinct  rg,  r^,  r2 
such  that  x3  + 1 = (x  - rg)  (x  - r^)  (x  - r2).  This  factorization 
is  not  unique  but  there  are  (3!)2-1  = 3!  = 6 different 
factorizations.  Following  the  procedure  described  by  Theorem  4.1 
and  example  4.1,  Downing  [Dow86a]  provides  the  following  sets  of 
roots  and  factorizations  of  x3  + 1 mod  91. 


Set 

1 = 

{-1, 

-16, 

17} 

-» 

X3 

+ 

1 

= (X 

+ 

1) 

(x  + 

16)  (x  - 17) 

Set 

2 = 

{-1, 

-9, 

10} 

-» 

X3 

+ 

1 

= (X 

+ 

1) 

(x  + 

9)  (x  - 10) 

Set 

3 = 

{-29, 

12, 

17} 

-» 

X3 

+ 

1 

= (X 

+ 

29) 

(x 

- 12)  (x  - 17) 

Set 

4 = 

{-29, 

-9, 

38} 

-> 

X3 

+ 

1 

S (X 

+ 

29) 

(x 

+ 9)  (x  - 38) 

Set 

5 = 

{-22, 

12, 

10} 

-> 

X3 

+ 

1 

= (X 

+ 

22) 

(x 

- 12)  (x  - 10) 

Set  6 = {-22,  -16,  38)  -* 


x3  + 1 = (x  + 22)  (x  + 16)  (x  - 38) 
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Obviously,  the  numbers  -1,  -16,  17,  -9,  10,  -29,  12,  38,  -22  satisfy 
x3  + 1 = 0 mod  91.  To  demonstrate  the  validity  of  Theorem  4.21  we 
must  show  that  the  multiplicative  inverses  of  the  above  numbers  also 
satisfy  x3  + 1 = 0 mod  91.  Since  r^-1  = -rjN-1  (equation  (4.13a), 
Theorem  4.12)  then 

(-1)-1  = -(-1)2  = -1,  (-16)-1  s -(-16)2  = -256  = -74  = 91  - 74  = 17, 
17"1  = -172  = -289  = -16,  (-9)"1  = -(-9)2  = -81  s 91  - 81  s 10, 

10"1  = -102  = -100  = -9,  (-29)"1  = -(-29)2  = -841  = -22, 

12-1  s -122  = -144  = -53  s 38,  38"1  = -382  = -1444  = -79  h 12, 

(-22)"^  s — ( — 22 ) 2 = -484  = -29,  and  the  produced  inverses 

-1,  17,  -16,  10,  -9,  -22,  38,  12,  -29  satisfy  x3  + 1 h 0 mod  91. 

4.4  The  Nth-Order  PRNS  Based  on  the 
Factorization  of  x^  - 1 in  Zm 

Several  applications  that  will  be  discussed  in  later  chapters 
require  the  product  of  two  (N-l)-order  polynomials  modulo  (xN  - 1) 
over  some  modular  ring  Zm.  For  such  applications,  effort  has  been 
made  to  achieve  Winograd's  lower  bound  for  the  multiplication  count. 
To  obtain  the  minimum  possible  multiplication  count,  xN  - 1 must  be 
factorized  in  N distinct  factors  in  Zm  like 

xN  - 1 = (x  — r0)  (x  - r-^)  ...  (x  - rN-1)  (4.58) 

with  rg,  rj_,  ...,  r^_j  s Zm.  Then,  there  exists  an  isomorphic 
mapping  f^  of  P(m)  onto  Zm  x . . . x Zm  = ZmN,  where  P(m)  is  a finite 
structure  containing  (N-l)-order  polynomials  with  coefficients  in 
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P(m)  = {A(x)  = ag  + a^  x + ...  + aN_^  xN_1  : a^  s Zm,  (4.59) 

i = 0,  1,  . ..,  N-l  and  rules  of  composition  in 

P(m)  shown  by  equations  (4.60),  (4.61)) 

n,  N-l  N-l 

For  A(x) , B(x)  s P(m),  with  A(x)  = E a-sx*,  B(x)  = E b^x*,  the 

n,  i=0  i=0 

rules  of  composition  in  P(m)  are 

N-l 

Addition:  C(x)  = A(x)  + B(x)  = E [(a*  + bi)modm]xi  (4.60) 

i=0 

Multiplication:  C(x)  = [A(x) *B(x)mod(x^+l) ] (arithmetic  modm)  (4.61) 

The  isomorphic  mapping  fN  maps  the  vector  (ag,  ...,  aN_^)  into  (ag*, 
• ••,  aN-l*)  an<^  satisfies  equation  (4.7),  while  the  inverse  mapping 
ffl-1  is  given  by  equations  (4.8)  and  either  (4.9)  or  (4.10).  The 
rules  of  composition  in  such  a PRNS  which  is  now  based  on 
factorization  of  (x^  - 1)  modm,  are  similar  to  these  of  a PRNS  based 
on  the  factorization  of  (x^  + 1)  modm  and  are  given  by  equations 
(4.11)  and  (4.12). 

Several  theorems  will  be  presented  next  to  complete  the  study  of 
this  modified  version  of  PRNS.  These  theorems  concern  the  modulus  m 
so  that  xN-l  can  be  factorized  in  N distinct  factors.  They  also 
concern  relationships  among  the  roots  of  xN-l  = 0 modm.  Since  they 
have  a lot  of  similarities  with  the  theorems  presented  in  section 
4.2  and  4.3  most  of  the  proofs  are  omitted. 

Theorem  4.22:  The  congruence  xN  s 1 modp,  for  prime  p,  has  N 

distinct  solutions  if  and  only  if  N|p  - 1. 
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Proof:  It  is  given  by  Stark  [Sta70a]. 

Theorem  4.23:  If  the  polynomial  - 1 can  be  factorized  into  N 

distinct  factors  modm  as  xN  - 1 h (x  - r0)  • (x  - rj)  ...  (x  - r^j) 
modm,  then  for  any  prime  p that  is  a factor  of  m and  such  that  N < 
p,  the  roots  r^,  i = 0,  ...,  N - 1 are  distinct  modp. 

Proof:  Similar  to  proof  of  Theorem  4.5. 

Theorem  4.24:  If  the  polynomial  x^-1  can  be  factorized  in  N 

distinct  factors  modm  as  xN  - 1 = (x  - rQ)  • (x  - r^  ...  (x  - rN-1) 
modm,  where  m = p^l  • p2e2  ...  pLeL,  with  N < pA,  pA  prime,  then 
N|pj  - 1,  i = 1,  . . . , L. 

Proof:  Similar  to  proof  of  Theorem  4.6. 

Theorem  4.25:  If  N is  a positive  integer  and  p is  prime  and  N < p, 
then  there  are  N distinct  roots  rQ,  r1  , ...,  rN-1  s Zpe  of  the 
polynomial  xN  - 1 such  that 

- 1 = (x  - tq)  (x  - rj)  ...  (x  - rfq_^)  modpe 

if  and  only  if  N|p  - 1.  This  factorization  of  xN  - 1 in  N distinct 
factors  is  unique. 


Proof:  Similar  to  proof  of  Theorem  4.10. 


88 


Theorem  4.26:  If  N,  m are  positive  integers  and  the  prime 
factorization  of  m is  m = Plel  • p2e2  ...  pLeL,  with  N < Pi,  Pi 
prime,  i = 1,  . . . , L,  then  there  are  distinct  tq,  rj_,  ...,  rN-1  e Zm 
such  that  xN  - 1 = ( x - rg)  (x  - r^)  ...  ( x - r^j_j)  modm  if  and 
only  if  N|p^  - 1 for  every  i = 1,  . ..,  L.  Such  factorization  is 
not  unique  but  there  are  (N!)L_1  different  factorizations. 

Proof:  Similar  to  proof  of  Theorem  4.1.  The  procedure  for  finding 
the  (N!)l_1  different  factorizations  is  also  the  same  as  the  one 
described  there. 

Corollary  4.3:  If  m = p,  prime  number  and  N = 2s,  s = positive 
integer,  then  the  necessary  and  sufficient  condition  that  the 
congruence  xN  s 1 modp  has  N distinct  solutions  is  that  p = 2s  k + 
1,  k = positive  integer.  In  such  a case  the  solutions  of  x^  = 1 
modp  are  1,  -1,  as  are  the  solutions  of  all  the  congruences  x^^2J  = 
-1  modp,  where  j = 1,  2,  ...,  s - 1,  N = 2s. 

Proof:  Theorem  4.22  implies  that  the  congruence  x^  = 1 modp  for 
prime  p has  N distinct  solutions  if  and  only  if  N|p  - 1.  For  N = 2s 
it  must  be  2s  | p - 1 « p - 1 = 2s  • k p = 2s  • k + 1,  where  k is  a 
positive  integer.  Using  Theorem  4.25  we  conclude  that  this  set  of 
solutions  is  unique.  Since  N = 2s  then  N/2  = 2S_1  and  since  p = 2s 
• k + 1 then  from  Corollary  4.1  the  congruence  x^2  = _i  mocjp  must 
have  N/2  distinct  solutions,  say  r0,  rx,  ...,  rN/2  - 1.  Then,  r^72 
h -1  modp,  i =0,  1,  ...,  N/2  - 1,  so  rj^  = i modp  and  tq,  r^,  ..., 
rN/2  “ 1 are  solutions  of  x^  s 1 modp. 
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The  fact  that  N/4  = 2s-2  and  p = 2s  • k + 1 = 2S_1  • 2 • k + 1 = 
2s-1  * ki  + 1 implies  that  the  congruence  xN/4  = -1  modp  has  N/4 
distinct  solutions,  say  q0,  qi,  qN/4_i,  for  which  qjN/4  £ -1 

modp,  j =0,  ...,  N/4  - 1 or  qjN  s 1 modp  and  qQ,  q^,  . .,  q^/4  - 1 
are  also  solutions  of  xN  = 1 modp.  Similarly,  the  solutions  of  all 
the  congruences  xN/8  e=  -1  modp,  xN/16  = -1  modp,  ...,  x2  = -1  modp 
are  solutions  of  x^  = 1 modp. 

The  number  of  the  roots  of  all  the  congruences  of  the  form  x^2  j 
s -1  modp,  j = 1,  2,  ...,  s - 1,  N = 2s  is 

N N N o , 0 , 

_+_+_+  ...  +2=  2s-1  + 2s-2  + 2S~3  + ...  + 2 = 

2 4 8 


2 - 1 

The  other  two  solutions  of  x^  s 1 modp  are  obviously  1 and  -1  since 
1N  = 1 modp  and  (-1)N  = (-1)2S  s 1 modp. 

Corollary  4.4:  If  the  prime  decomposition  of  m is  m = p^el  • P2e2  • 
...  PLeL,  p^  prime,  i = 1,  2,  ...,  L and  N = 2s,  s is  a positive 
integer,  then  the  necessary  and  sufficient  condition  for  the 
congruence  xN  h 1 modm  to  have  N distinct  solutions  in  Zm  is  that  p^ 
is  the  form  2s  • kj  + 1 , i = 1 , 2 , . . . , L,  kj  = positive  integer. 

Proof:  Theorem  4.26  implies  that  the  congruence  x^  = 1 modm  has  N 
distinct  solutions  in  Zm  if  and  only  if  N|pi  - 1 for  every  i = 1,  2, 
...,  L.  For  N = 2s  it  must  be  2s  | p^  - 1 « p^  - 1 = 2s  • k^  « p^  = 
2s  • kj  + 1,  k^  positive  integer,  i = 1,  2,  ...,  L. 
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Theorem  4.27:  If  - 1 can  be  factorized  in  N distinct  factors 

modm  as  xN  - 1 = (x  - rg)  (x  - r-^)  ...  (x  - rN-1)  modm,  then  the 

inverses  of  the  roots  rj'j  exist  for  every  i = 0,  1,  ...,  N - 1 and 

j=l,  2,  . . . , N - 1 and  are  given  by  r^“J  = r^N-j  modm. 

Proof:  Since  r^  is  a root  of  xN  i 1 modm  then  r^N  s 1 modm  or  rjN-j 
• r^J  = 1 modm.  This  shows  that  r^J  and  r^N_j  are  multiplicative 

inverses  of  each  other  modulo  m.  Then, 

i = 0,  . . . , N - 1 

ri"J  = riN"^  modm>  (4.62) 

j = 1,  N - 1 

Theorem  4.28:  If  N,  m are  positive  integers  and  the  prime 

factorization  of  m is  m = p^l  • p2e2  ...  pLeL  with  N < pk,  pk  prime 

and  if  N|pk  - 1 for  k = 1,  2,  ...,  L in  which  case  xN  - 1 can  be 

factorized  in  N distinct  factors  modulo  m as  xN  - 1 = (x  - rg)  (x  - 
ri)  ...  (x  - rN_!) , then 

N-l 

£ r^  = 0 modm  j = 1,  ...,  N-l  (4.63) 

i=0 

and 

N-l 

n ri  = -(-1)N  (4.64) 

i=0 

Proof:  Since  xN  - 1 = (x  - r0)  (x  - rx)  ...  (x  - rN_1)  modm,  then 

by  multiplying  out  the  second  part  of  the  above  equation  we  obtain 

xN  - 1 h xN  - (£  ri)xN"1  + (£  rirj)xN-2  - (£  rirjrk)xN"3 

+ ...  + (-l)N  rg  * r j • . . . r^_i  modm 
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N-l  N-l 

Therefore,  (-1)N  • II  ri  = -1  modm  or  II  rj  s -(-1)N  modm  and 

i=0  i=0 

(4.64)  is  proven. 

To  prove  (4.63)  simply  repeat  the  proof  of  Theorem  4.15. 


Theorem  4.29:  For  the  same  hypothesis  as  in  Theorem  4.28 

N-l 

r rj-J  h 0 modm,  j = 1,  ...,  N - 1 (4.65) 

i=0 

Proof:  By  Theorem  4.27  (equation  (4.62))  it  is  r^j  = r^-J  modm. 

N-l  _ N-l 

Then  E r^~J  = E rjN~j  modm  s 0 modm  (Theorem  4.28),  q.e.d. 
i=0  i=0 

Theorem  4.30:  If  N is  even  and  r^  satisfies  the  congruence  xN  - 1 = 
0 modm,  then  the  additive  inverse  -r^  modm  of  r^  satisfies  xN  - 1 = 
0 modm. 

Proof:  Since  r^  satisfies  xN  - 1 h 0 modm,  then  r^N  s 1 modm.  But 
since  N is  even,  then  (-r^N  = (-1)N  • r^  modm  s r4N  modm  = 1 modm. 
So  (-rj)N  = 1 modm  and  -r^  satisfies  xN  - 1 = 0 modm. 

Theorem  4.31:  If  r^  satisfies  xN  - 1 = 0 modm,  then  the 

multiplicative  inverse  rj-1  modm  of  rj  satisfies  xN  - 1 = 0 modm. 

Proof:  Since  rj  satisfies  xN  - 1 = 0 modm,  then  rjN  = 1 modm.  From 

Theorem  4.27  (equation  (4.62))  r4_1  = r4N_1  modm  and  then  (r^1)1^  = 
(rj^  ^)^  = (rj^)N~l  = jN-1  = i modm.  So  (r^-^-)^  h 1 modm  and  rj_l 
satisfies  xN  - 1 =0  modm. 
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It  follows  now  from  Theorems  4.30  and  4.31  that  if  N is  even  and 

m = p,  prime  (in  which  case  - 1 = 0 modp  has  a unique  set  of 

roots),  then  the  roots  of  - 1 = 0 modp  are  rg  = 1,  ri  s -1  and 
the  remaining  N - 2 roots  appear  in  groups  of  multiplicative 

inverses  r2,  r2_1,  r3,  r3_1,  ...,  rx,  rx_1  or  in  groups  of  additive 

inverses  r2,  -r2,  r3,  -r3,  ...,  rx,  -rx. 

If  N is  odd  and  m = p,  prime,  then  the  roots  of  xN  - 1 =0  modp 
are  rQ  = 1 and  the  remaining  N - 1 roots  appear  in  groups  of 
multiplicative  inverses  rx,  ri-1,  r2,  r2_1,  ...,  rk,  rk_1. 

The  following  examples  demonstrate  the  validity  of  the  above 
theorems  on  the  factorization  of  xN  - 1 in  Zm. 

Example  4.7:  Factorize  x3  - 1 mod  13. 

Here,  the  order  of  the  polynomial  is  N = 3 and  the  prime  modulus 
is  p = 13.  Since  N|p  - 1 (because  3 | 12 ) , then  Theorem  4.22  implies 
that  the  congruence  x3  = 1 mod  13  should  have  three  distinct 
solutions  in  Z33.  These  are  tq  =1,  rj  = 3,  r2  = 9 (because  l3  = 1 

mod  13,  33  = 27  s 1 mod  13  and  93  a 81.9  s 3.9  = 27  = 1 mod  13).  So 

“ 3 = (^  - 1)  (x  - 3)  (x  - 9)  mod  13.  This  set  of  solutions  is 
also  unique.  Observe  that  rQ  + r3  + r2  = 1 + 3 + 9 s 13  a 0 mod  13, 

r02  + r32  + r22  = 1 + 9 + 81  = 91  s 0 mod  13  and  r0  • rx  • r2  a 1 • 

3 ' 9 = 27  s 1 mod  13  which  is  in  agreement  with  Theorem  4.28.  Also 

r0~1  = 1-1  s 1 = l2  = r02,  ri~2  = 3"1  a 9 a 32  = rx2,  r2“1  = 9'1  = 3 

= 92  a r22,  which  is  in  agreement  with  Theorem  4.27.  Theorem  4.29 
is  satisfied  by  the  fact  that  r0-1  + rj-1  + r2_1  =1+9+3=  13= 

0 mod  13  and  r0“2  + r^2  + r2~2  = 1 + 81  + 9 = 91  = 0 mod  13. 
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Example  4.8:  Factorize  x3  - 1 mod  121. 

The  order  of  the  polynomial  is  N = 5 and  the  modulus  is  m = p2  = 

ll2  = 121,  while  p = 11  is  a prime.  Since  N|p  - 1 (5 | 10) , then  from 

Theorem  4.25  the  congruence  x5  = 1 mod  121  has  five  distinct 

solutions  in  Z32i,  which  are  also  unique.  The  number  rg  = 1 is  such 

a solution  since  rg5  = l5  s 1 mod  121  and  so  is  the  number  r1  = 3 

since  q5  = 35  i 81  • 3 s 243  = 1 mod  121.  But  Theorem  4.31  implies 

that  r^-1  must  also  be  a solution.  Then  r2  s r^_1  = 3~1  = 81  or  r2 

s 81  satisfies  x-*  s 1 mod  121.  The  other  two  solutions  are  r3  = 9 

(since  95  = (32)5  = (35)2  s l2  = 1)  and  r4  = r^1  = 9"1  = 27.  Then 

x5  - 1 = (x  - 1)  (x  - 3)  (x  - 81)  (x  - 9)  (x  - 27)  mod  121.  It  can 

4 . 4 

be  easily  shown  that  E r^  = 0 mod  121,  j = 1,  ...,  4 and  n r;  == 

i=0  i=0 

1 as  required  by  Theorem  4.28,  while  the  reader  is  encouraged  to 

double  check  Theorems  4.27  and  4.29  for  the  above  five  roots. 

Example  4.9:  Find  the  roots  of  x8  i 1 mod  17. 

The  order  of  the  congruence  is  N = 8 = 23  and  the  prime  modulus 
is  p = 17  = 8 • 2 + 1 = 23  • 2 + 1.  Corollary  4.3  implies  that 
congruence  x88  s 1 mod  17  has  eight  distinct  solutions  in  Zyj . 
According  to  the  same  Corollary  4.3  these  eight  roots  are  1,  -1,  the 
roots  of  the  congruence  x4  s -1  mod  17  and  the  roots  of  the 

congruence  x2  h -1  mod  17,  or  r0  = 1,  r1  s 16,  r2  = 2,  r3  = 15,  r4  = 

9,  r^  =8,  r6  - 4>  r7  s l3-  Since  N = 8,  even,  the  additive 
inverses  of  the  roots  should  be  roots  as  well  (Theorem  4.30). 

Indeed  -rg  = rj.,  -r2  = r3,  -r4  s r3,  -r^  = r-j . According  to 

Theorem  4.31  the  multiplicative  inverses  of  the  above  roots  should 

also  be  roots  of  x8  - 1 = 0 mod  17,  which  is  true  since  rg-^  = 1-1  = 
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1 s rg,  * = 16“*  s 16  s rj_,  r2~l  = 2"^  s 9 s , r3_-*-  s 15-*  s 8 
s r5»  r6_1  = 4_1  = 13  = r7  * 


Example  4.10:  Factorize  - 1 mod  91. 

The  order  of  the  polynomial  is  N = 3 and  the  modulus  is  m = 91  = 
7 • 13  = p]_*  • p2*  = m^  • m2.  Noting  that  N|p]_  - 1 and  N|p2  - 1 
(since  3 1 6 , 3 1 12 ) and  using  Theorem  4.26  we  find  that  x-^  - 1 must  be 
factorized  in  three  distinct  factors  mod  91  in  (N!)^-!  = (3!)2~1  = 
3!  = 6 different  ways. 

For  = pi  = 7,  the  congruence  x^  = 1 mod  7 has  three  distinct 
and  unique  solutions  r^g  s 1,  r^  = 2,  r^2  h 4,  while  for  m2  = p2  = 
13,  the  congruence  x^  = 1 mod  13  has  the  solutions  r2Q  h 1,  r2j  = 3 
and  r22  s 9. 

Consider  the  matrix 


Al  = 


r10  rll  r12  ' 
r20  r21  r22  - 


Taking  all  possible  permutations  of  row  2 of  matrix  A^  and  leaving 
row  1 untouched,  the  following  six  matrices  (including  A^)  are 
obtained : 


' r10  rll  r12  ' 

' r10  rll  r12  ' 

r r10  rll  r12  ' 

Ai  = 

< r20  r21  r22  - 

* a2  = 

- r20  r22  r21  > 

II 

m 

<C 

' r21  r20  r22  > 

' r10  rll  r12  ' 

' r10  rU  r12  ' 

' r10  rll  r12  ' 

a4  = 

‘ r22  r20  r21  - 

, a5  = 

' r21  r22  r20  ^ 

II 

kO 

< 

^ r22  r21  r20 
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By  applying  the  Chinese  Remainder  Theorem  on  the  columns  of  the 
above  six  matrices  we  can  get  the  six  different  sets  of  roots  and 
factorizations  of  x^-l  mod  91.  Matrix  A3  gives  tq  = (r^Q  • M3  • N3 
+ r20  • M2  • N2)  modm  where  M3  = m/1113  = P2  = 13,  N3  2 M3-I  modpj  2 
13--*-  mod  7 s 6,  M2  = m/m2  = P3  = 7 and  N2  = M2-^  modp2  s 7-^  mod  13 
2 2. 

So 


r0  = (r^Q'M^’N^  + r20'^2'^2)modm  2 (1*13*6  + l*7*2)mod91 
rl  s (r^^'M^'N^  + r2i’M2*N2)modm  2 (2*13*6  + 3*7*2)mod91 
r2  s (r^2"^l'Ni  + r22’^2’^2)modm  = (4*13*6  + 9*7*2)mod91 

The  other  matrices  give  the  following  sets  of  roots: 


Matrix  A2 

r3  = (r10*Ml"Nl 
r4  2 (rji'Mj-N! 

r5  - (ri2'MrNi 


+ r20'M2'N2)modm  2 (1*13*6  + l*7*2)mod91 
+ r22’M2"N2)niodm  2 (2*13*6  + 9*7*2)mod91 
+ r2i*M2*N2)modm  2 (4*13*6  + 3*7*2)mod91 


Matrix  A3 
r6  = (r10*M1*N1 
r7  2 (r11*M1*N1 
r8  s (r12'M1*N1 


r2i*M2*N2)modm 

r20'^2"^2)mo^m 

r22’^2’^2)mo<^m 


(1*13*6  + 3 • 7 • 2)mod91 
(2*13*6  + l*7*2)mod91 
(4*13*6  + 9 • 7 • 2)mod91 


Matrix  A4 

r9  = (rio-Mi-Ni  + r22 ' ^2 * N2)modm  2 (1*13*6  + 9*7*2)mod91 
r10  5 (rll’^l'Nl  + r20‘M2‘N2)niodm  2 (2*13*6  + l*7*2)mod91 
rll  = (r^'Mi'N^  + r2i  * M2  *N2)modm  2 (4*13*6  + 3*7*2)mod91 


2 92  2 1 
2 198  2 16 
2 438  2 74 


2 92  2 1 
2 282  2 9 
2 354  2 81 


2 120  2 29 
2 170  2 79 
2 438  2 74 


2 204  2 22 
2 170  2 79 
2 354  2 81 
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Matrix  A5 

r12  5 (rio’Mi'Ni 

r13  = (rll'Ml'Nl 
r14  = (r12-M1-N1 

Matrix 

r15  5 (rio-Mi-Ni 
r16  s (rll*M1’N1 
r17  s (r12‘Ml‘Nl 


+ r2^*M2>N2)modm 
+ r22 • M2 • N2 )modm 
+ r2Q,M2*N2)modm 

+ r22 • M2 • N2 )modm 
+ r2^'M2-N2)modm 
+ r2Q • M2 • N2 )modm 


s (1-13*6  + 3-7 
= (2-13-6  + 9-7 
s (4-13-6  + 1-7 

= (1-13-6  + 9-7 
s (2-13-6  + 3-7 
= (4-13-6  + 1-7 


2)mod91  = 120  h 29 
2)mod91  5 282  = 9 
2)mod91  = 326  h 53 

2)mod91  s 204  s 22 
2)mod91  s 198  = 16 
2)mod91  = 326  h 53 


The  six  different  sets  of  roots  and  factorizations  of  x^-l  mod 
91  are  shown  below 


Set 

h- * 
II 

H- > 

16,  74} 

-» 

x^  - 

1 

= 

(X  - 

1) 

(x  - 

16) 

(x  - 

74) 

Set 

2 = {1, 

9,  81} 

-* 

x3  - 

1 

= 

(X  - 

1) 

(x  - 

9) 

(x  - 

81) 

Set 

3 = {29, 

79,  74} 

-» 

x^  - 

1 

= 

(X  - 

29) 

(x  - 

79) 

(x  - 

74) 

Set 

4 = {22, 

79,  81} 

-» 

x^  _ 

1 

s 

(X  - 

22) 

(x  - 

79) 

(x  - 

81) 

Set 

5 = {29, 

9,  53} 

x^  - 

1 

= 

(X  - 

29) 

(x  - 

9) 

(x  - 

53) 

Set 

6 = {22, 

16,  53} 

-> 

x3  - 

1 

= 

(X  - 

22) 

(x  - 

16) 

(x  - 

53) 

The  Theorems  4.28  and  4.29  hold  true  within  each  of  the  six  sets 
of  roots.  For  example,  for  set  1 = { 1,  16,  74}  observe  that  1+16 
+ 74  = 91  2 0,  l2  + 162  + 742  = 1 + 256  + (-17)2  h 1 + 256  + 289  a 
546  = 0 and  1 • 16  • 74  s 1 • 16  • (-17)  = -272  h -(-1)  = 1. 

For  the  union  of  the  six  sets  of  roots  we  get  S = {1,  16,  74,  9, 
81,  29,  79,  22,  53},  which  forms  the  set  of  all  the  numbers  that 
satisfy  x^-1  s 0 mod  91.  To  demonstrate  the  validity  of  Theorem 
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4.31  we  must  show  that  the  multiplicative  inverse  of  any  number  in  S 
also  belongs  in  S.  Since  r^~l  s r^-*  modm,  (Theorem  4.27,  equation 
(4.62)),  then 

(l)"1  e l2  = 1,  16-1  = 162  e 256  = 74,  74"1  e 742  e 16,  9'1  = 92  e 
81,  81_1  e 812  e 9,  29_1  e 292  e 22,  79-1  = 792  e 53,  22"1  e 222  e 
29,  53~^  e 532  s 79,  and  the  produced  inverses  1,  74,  16,  81,  9,  22, 
53,  29,  79  belong  in  S. 

Theorem  4.32:  If  the  polynomial  x^-1  can  be  factorized  in  N 

distinct  factors  in  Zm  as  xN-l  s (x  - rg)  (x  - r^)  ...  (x  - rN_^), 

'Xj  'Xj 

then  the  PRNS  mapping  fN  from  P(m)  onto  Zm  x Zm  x . . . x Zm  = ZmN 
defined  by  equation  (4.7)  is  an  isomorphism. 

Proof : Similar  to  proof  of  Theorem  4.17. 

% 'U 

Theorem  4.33:  The  mapping  fN“^  : (ag*,  a^*,  ...,  aN_i*)  fN-l  A(x) 

N-l 

such  that  A(x)  = ( E a^*  • Q^x))  mod  (xN  - 1)  with  Q^(x)  as  in 

i=0 

equation  (4.9)  is  the  inverse  mapping  of  f^j  described  by  equation 
(4.7). 

Proof:  Similar  to  proof  of  Theorem  4.18. 

Theorem  4.34:  If  the  polynomial  x^  - 1 is  factored  in  Zm  as  x^-1  = 
(x  - rg)  (x  - r^)  ...  (x  - r^_i)  modm,  then  the  polynomials  Q^(x)  = 
N-^  (1  + rj-^x  + r^-2x2  + ...  + r^-(N-2)xN-2  + r^-(N-l)xN-l)  also 
satisfy  Qj(x)  = 6^  mod  (x  - r j ) , i,  j = 0,  . ..,  N - 1. 
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Proof : Similar  to  proof  of  Theorem  4.19. 

Theorems  4.32  and  4.33  justify  the  simplified  rules  of 
composition  in  this  modified  version  of  the  PRNS.  These  simplified 
rules  are  given  by  equations  (4.11)  and  (4.12). 

Example  4.11:  Perform  P^(x)  • P2 ( ) mod  (x-^  - 1)  in  Z 7. 

The  main  goal  of  this  example  is  to  demonstrate  the  validity  of 
Theorems  4.32,  4.33,  and  4.34. 

The  roots  of  the  congruence  x-^-1  h 0 mod  7 are  rg  s 1,  rj  s 2 
and  V2  s 4,  and  x-^  - 1 s (x  - 1)  (x  - 2)  (x  - 4).  Consider  the 
second  order  polynomial  A(x)  = ag  + a^x  + • Then  the  forward 

'Xj  ^ 

isomorphic  mapping  f 3 satisfies  A(x)  = (ag,  a^,  &2)  ^3  (ag*,  al*> 

a2*)  where 

ag*  = A(x)  mod  (x  - rg) 
a^*  = A(x)  mod  (x  - r^) 
a2*  = A(x)  mod  (x  - r2) 
or 

a0*  = a0  + al  + a2 
al*  = ag  + 2a^  + 4a2 
a2*  = ag  + 4a^  + 2a2 

^ f - 1 

The  inverse  mapping  fo  1 satisfies  (ag*,  a^*,  a2*)  r3  A(x)  = ( Z 

i=0 

a^*  ' Qi(x))  mod  (x^  _ 1),  where  Q^(x)  are  given  by  equation  (4.9) 
or  in  Z7 


A(r^)  = A(2)  = ag  + 2a^  + 4a2 
A(r2)  = A(4)  = ag  + 4a^  + 2a2 


(4.66) 
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Qq(x)  = 3-^(l+rQ-^x+rQ~^x^)  = 5(l+l_^x+l_^x^)  = 5(l+x+x^) 

Qj(x)  = 3~-*-(  l+r^--*-x+r^“^x^)  = 5(l+2_-*-x+2_^x^)  = 5(l+4x+2x^) 

Q2(x)  = 3-^(l+r2~'*-x+r2“^x^)  = 5(l+4~lx+4~2x2)  = 5(l+2x+4x^) 

Qq ( x ) = 5 (1  + x + x^)  (4.67) 

Q^(x)  = 5 (1  + 4x  + 2x2) 

Q2(x)  = 5 (1  + 2x  + 4x2) 


To  demonstrate  the  validity  of  Theorem  4.34  observe  that 


Qo(x)mod(x-r0) 

Qo(x)mod(x-ri) 

Qo(x)mod(x-r2> 

Q1(x)mod(x-ro) 

Q^(x)mod(x-r^) 

Q^(x)mod(x-r2> 

Q2(x)mod(x-r0) 

Q2(x)mod(x-r^) 

Q2(x)mod(x-r2) 


= Qo<ro>  = Qod) 
= Qo<ri)  = Qo(2) 
= Qo(r2)  = Qo(4> 
= Ql(r0)  = Qi(l) 
= Ql(ri)  = Qx(2) 
= Ql(r2>  = Ql(4> 
= Q2(r0)  = Q2(D 
= Q2<rl)  = ^2(2) 
= Q2<r2>  = 02(4) 


5(1+1+1)  = 5-3  = 15mod7  = 1 
5( 1+2+4)  = 5-7  = 35mod7  = 0 
5(1+4+16)  = 5-21  = 0 
5( 1+4+2)  = 5-7  = 0mod7 
5( 1+8+8)  = 5-17  = 5-3  = lmod7 
5(1+16+32)  = 5-49  = 0mod7 
5(l+2+4)  = 5-7  = 0mod7 
5(1+4+16)  = 5-21  = 0mod7 
5(1+8+64)  = 5*73=5 • 3=15=lmod7 


and  it  follows  that  Qj^(x)  = mod  (x  - r j ) . 

Consider  now  the  multiplication  of  P"i(x)  = ag  + a^x  + a2x2  = 5 + 
6x  + 4x2  with  P2(x)  = bg  + b^x  + b2x2  = 4 + 3x  + 5x2  mo(julo  (x3  _|). 
The  arithmetic  is  performed  in  Z7.  According  to  equation  (4.66) 


ag*  = (5+6+4)mod7  = 1 
a!*  = (5+12+16)mod7  e=  5 
a2*  = (5+24+8)mod7  = 2 


bo*  = (4+3+5)mod7  = 5 
b^*  s (4+6+20)mod7  = 2 
b2*  = (4+12+10)mod7  h 5 
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The  polynomial  product  in  the  PRNS  is  given  by  cq*,  c^*,  C2*  where 

cq*  = (aQ*- bQ*)mod7  = (l*5)mod7  s 5 
= (a^*- b^*)mod7  = (5*2)mod7  s 3 
C2*  = (a2* ’ b2*)mod7  s (2-5)mod7  = 3 

^ _i 

The  inverse  mapping  1 maps  (co*>  cj_*,  C2*)  into  c(x)  = cq  + 
c^x  + C2X2,  where  c(x)  = cq*  Q0(x)  + C]_*  Qi(x)  + C2*  Q2(x)  with 
Qo(x)>  Qi ( x ) and  Q2(x)  given  by  equation  (4.67).  Then 

eg  s 5 • (co*+ci*+C2*)mod7  s 5 • (5+3+3)mod7  s 55mod7  = 6 
= 5* (co*+4c2*+2c2*)mod7  s 5* (5+12+6)mod7  h 115mod7  = 3 
C2  s 5* (co*+2ci*+4c2*)mod7  s 5- (5+6+12)mod7  s 115mod7  s 3 


and 


P]_(x) • P2(x)mod(x^-l ) = (5+6x+4x^) • (4+3x+5x^)mod(x3-l)  = 6+3x+3x^ 

To  verify  the  results,  note  that  in  Z7  (5+6x+4x^) • (4+3x+5x^)mod(x3- 
1)  = 


= (20+24x+16x^+15x+18x^+12x^+25x^+30x^+20x^)mod(x-^-l)  = 
= (20+24x+16x^+15x+18x^+12+25x^+30+20x)mod(x^-l)  = 

= (62mod7+(59mod7)x+(59mod7)x^)  s 6+3x+3x^ 


CHAPTER  5 

PRNS  COMPLEX  AND  REAL  ARITHMETIC  ALUS 


In  this  chapter,  both  complex  and  real  arithmetic  will  be 
examined  in  the  context  of  the  Polynomial  Residue  Number  System.  In 
the  case  of  complex  arithmetic,  existing  algorithms  can  be  employed 
to  approximate  numbers  with  polynomials  which  have  reduced 
wordlength  requirements.  Using  the  PRNS,  such  polynomials  can  be 
multiplied  in  parallel  channels,  thus  obtaining  higher  throughputs 
with  less  hardware. 

In  the  case  of  real  arithmetic  the  decomposition  of  the  product 
(A  • B)  modm  in  polynomial  channels  will  be  discussed.  If 
arithmetic  has  to  be  performed  in  some  modular  ring  Zm,  then  A,  B e 
Zm.  The  above  product  can  be  realized  as  a product  of  two 
polynomials,  with  arithmetic  now  performed  in  a different  modular 
ring,  Zmi,  where  m^  < m.  This  decomposition  will  result  in  faster 
and  less  expensive  real  multiplication. 

5 . 1 PRNS  Complex  Arithmetic  ALUs 


5.1.1  QRNS  Single  Modulus  Complex  ALU 

The  desirable  simplicity  of  the  complex  multiplication  offered 
by  the  QRNS  (as  discussed  in  Chapter  3),  resulted  in  the  design  of  a 
QRNS  Single-Modulus  (SM)  Complex  Arithmetic  Logic  Unit  (ALU).  Such 
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a design  was  offered  by  Taylor  [Tay85a].  The  advantage  of  a Single- 
Modulus  versus  a multi-moduli  design  is  that  the  Single-Modulus  one 
provides  easy  magnitude  scaling,  sign  detection  and  overflow 
detection,  while  in  a multi-moduli  design  these  operations  are  much 
more  complex.  However,  compared  to  a conventional  radix-2  system 
(sign-magnitude,  two's  complement),  a Single-Modulus  approach  will 
have  the  following  disadvantages:  more  specialized  hardware,  more 
complex  zero  detection  and  more  complex  magnitude  comparison. 

Two  different  cases  will  be  briefly  presented  now.  One  is  based 
on  table  lookups  while  the  other  uses  conventional  hardware.  In 
both  cases  all  the  necessary  units  will  be  examined.  These  include 
the  forward  (RNS  to  QRNS)  mapping  unit,  the  QRNS  adder,  the  QRNS 
multiplier  and  the  inverse  (QRNS  to  RNS)  mapping  unit.  In  both 
cases  the  use  of  a Single-Modulus  choice  m will  be  considered,  where 
the  modulus  m has  a 2n-bit  representation.  The  reason  of  choosing  a 
2n-bit  modulus  for  our  Single-Modulus  QRNS  ( SM/QRNS)  is  that,  as  it 
will  be  explained  in  the  next  section,  this  system  is  equivalent  to 
a Single-Modulus  PRNS  (SM/PRNS)  with  an  n-bit  modulus  choice.  In 
the  next  section,  a comparison  of  a 2n-bit  SM/QRNS  with  an  n-bit 
SM/PRNS  will  be  presented. 

Case  (i):  Table  lookups. 

Figures  5.1  and  5.2  show  the  table  lookup  implementations  of  the 
forward  and  inverse  QRNS  mappings  ( f 2 » ^2_^)  as  described  by 
equations  (3.1)  and  (3.2). 
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note:  <x>m  = xmodm 


Figure  5.1  Table  lookup  implementation  of  the  forward  mapping  f2 

It  can  be  seen  that  both  the  forward  and  the  inverse  QRNS 
mappings  require  two  table  lookup  operations.  The  choice  of  a 2n- 
bit  modulus  requires  a hardware  investment  of  2^n  x 2n  bits  per 
memory  table.  The  equations  (3.7)  show  that  both  the  complex  QRNS 
addition  and  multipliplication  require  two  table  lookup  operations 
modulo  m. 


z 


z* 


r-> 

-> 

-> 
— > 


< 2 1 ( z+z* ) >m 

' J \ ~ ~ ' 

m 

Figure  5.2  Table  lookup  implementation  of  the  inverse 
mapping  f 2~ ^ 

The  memory  requirements  for  a SM/QRNS  Complex  Arithmetic  unit  based 
on  a 2n-bit  modulus  m are  shown  in  Table  5.1. 
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Table  5.1  Memory  requirements  for  a SM/QRNS  Complex 
Arithmetic  unit  based  on  a 2n-bit  modulus 


Function 


Memory  Size  in  Bits 


Inverse  Mapping 


Multiplier 


Mapping 


Adder 


24n+2  x n 
2^n+2  x n 

24n+2  x n 
24n+* 2  x n 


Case  (ii):  Conventional  hardware. 

If  a large-wordlength  arithmetic  system  is  required  (i.e. 
wordlength  > 8 bits),  conventional  hardware  realizations  might  be 
more  attractive  than  table  lookup  methods  because  of  the  higher  cost 
and  slowness  of  large  tables.  For  the  modulus  to  be  acceptable,  it 
has  to  satisfy  certain  mathematical  conditions.  Namely,  if  the 
modulus  m is  a prime  number  p,  then  the  QNRS  mappings  exist  if  p = 
4k  + 1,  with  k a positive  integer  (Theorem  3.1).  When  hardware 
design  elegance  is  an  issue,  the  following  moduli  are  often 
referenced:  p^  = 2n  - 1,  p£  = 2n,  P3  = 2n  + 1.  If  we  equate  p^  = 

2n  - 1 to  p = 4k  + 1 we  get  2n-l=4k+l^k=  2n~2  - 1/2.  Such 
a k is  not  an  integer  and  therefore  is  inadmissible.  For  the  second 
modulus  choice  P2  = 2n,  the  multiplicative  inverse  of  two  (i.e.  2~1) 
does  not  exist  and  therefore  the  inverse  mapping  found  in  equation 
(3.2)  cannot  be  defined.  The  third  choice  P3  = 2n  + 1 gives  p = 2n 
+l=4k+l«k=  2n~2,  an  integer.  As  a result,  p = 2n  + 1 is  the 
optimal  modulus  choice  from  a mathematical  and  a hardware  point  of 
view.  Table  5.2  contains  some  interesting  and  useful  prime  numbers 
of  the  form  p = 2n  + 1. 
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Table  5.2  Gaussian  Primes 


n 

2n  + 1 

2 

5 

4 

17 

8 

257 

16 

65537 

For  reasons  stated  previously,  the  discussion  will  be  based  on  a 
2n-bit  modulus.  Since  our  optimal  modulus  choice  was  found  to  be  p 
= 2^  + 1,  we  can  choose  p = 2^n  + 1.  Although  such  a choice 
requires  a (2n  + l)-bit  system,  this  can  be  easily  compressed  to  a 
2n-bit  one  [Tay85a]. 

A complete  SM/QRNS  Complex  ALU  should  have  the  ability  of 
performing  the  forward  mapping  f£  given  by  equation  (3.1),  a complex 
addition  consisting  of  two  real  additions  modulo  p,  a complex 
multiplication  consisting  of  two  real  multiplications  modulo  p and 
the  inverse  QRNS  mapping  given  by  equation  (3.2).  The  forward 
and  inverse  QRNS  mappings  (equations  (3.1),  (3.2)),  involve  the 
number  j (the  root  of  the  congruence  x^  + 1 = 0 modp),  as  well  as 
the  numbers,  2“ modp  and  (2~1  j-l)  modp.  Since  we  chose  p = 2^n  + 
1,  then 

T = 2n  (5.1) 

For  2~1 , suppose  that  2~ ^ h x modp.  Then  2 • x s 1 modp,  or  2 x = 
p + 1 = 2^n  + 2.  Therefore 
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2-l  = 22n_1  + 1 (5.2) 

Finally,  since  j = 2n,  then  2~^j-^  = (22n-^  + 1)  • 2-n  = 2n_^  + 2~n 
= 2n“l  + j--*-.  But  it  was  proven  in  Chapter  3 that  j--*-  s -j  = -2n. 
Consequently 

2-1  • j"1  = 2n_1  - 2n  (5.3) 

It  is  obvious  from  equations  (5.1),  (5.2),  and  (5.3),  that  the 
entries  j,  2~^  and  2_lj_l  can  only  be  realized  with  binary  shifts 
and  adds  modp.  Our  complete  system  should  now  be  consisting  of  the 
following  components:  modp  negator,  modp  adder,  modp  multiplier, 
and  binary  m-bit  shifter  modp.  Taylor  has  reported  on  all  the  above 
components  in  his  SM/QRNS  Complex  ALU  design  [Tay85a]  and  has  found 
the  Propagation  Delays  shown  in  Table  5.3.  In  this  table,  A corre- 
sponds to  a NAND  gate  delay  and  n is  the  wordlength  of  the  data  bus. 
Figures  5.3  and  5.4  show  the  Conventional  Hardware  implementations 
of  the  forward  and  inverse  QRNS  mappings  described  by  equations 
(3.1)  and  (3.2).  The  propagation  delays  calculated  by  using  the 
data  from  Table  5.3  and  Figures  5.3  and  5.4  for  the  complex  add, 
complex  multiply,  and  QRNS  mappings  are  shown  in  Table  5.4. 

5.1.2  PRNS  Single-Modulus  Complex  ALU 

In  the  previous  section,  a SM/QRNS  2n-bit  system  was  discussed 
based  on  a modulus  choice  of  p = 22n  + 1.  One  way  to  decrease  the 
hardware  requirements  and  increase  the  speed  of  a Single-Modulus 
QRNS  based  system  is  to  span  the  2n-bit  dynamic  range  in  two  halves, 
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Table  5.3  Propagation  Delays 


Device  Delay 


negate 

5A 

mod  p adder 

8A 

mod  p multiplier 

(7n-5)  A 

m bit-shifter 

mA 

x 


y 


modp 


Figure  5.3  Conventional  hardware  implementation  of  the 
QRNS  mapping  f£ 


modp 


z* 


> x 


> y 


modp 


Figure  5.4  Conventional  hardware  implementation  of  the  QRNS 
inverse  mapping  f 2 
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Table  5.4  Propagation  Delay  requirements  for  a SM/QRNS  Complex 

Arithmetic  Unit  based  on  the  modulus  choice  p = 22n  + 1 


Function 

Propagation  Delay 

Mapping 

(13  + n)  A 

Complex  Addition 

8 A 

Complex  Multiplication 

( 14n  - 5)  A 

Inverse  Mapping 

(21  + n)  A 

by  choosing  a set  of  two  moduli,  each  one  approximately  being  an  n- 
bit  number.  Such  an  approach  would  suffer,  in  general,  all  the 
disadvantages  of  a multi-moduli  RNS ; that  is,  difficult  magnitude 
scaling,  sign  detection,  and  overflow  detection.  The  optimal 
solution  would  be  to  remain  in  a Single-Modulus  environment  but 
increase  the  number  of  parallel  channels,  decreasing  at  the  same 
time  the  wordlength  of  each  channel. 

Games  [Gam,  Gam85a]  found  an  algorithm  through  which  complex 
numbers  can  be  approximated  by  third-order  polynomials.  The 
motivation  behind  this  scheme  is  that  by  breaking  a complex  number  x 
+ jy,  where  x,  y e R,  into  a four-tuple  of  smaller-wordlength 
components,  the  memory  address  space  required  for  the  memory 
intensive  RNS  arithmetic  is  reduced.  The  four-tuple  consists  of  the 
numbers  uq,  u^,  U£,  and  U3  defined  by 

x + jy  ~ up  + uiz  + U2Z2  + U3Z3  (5.4) 

, Uj  sZp,  i =0,  *..,  3 


z = e j ^ 
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Gaines  [Gam,  Gam85a]  has  shown  that  by  linearly  combining  vectors  in 
Z[eJn//4],  a complex  number  (x,  y)  can  be  approximated  with  a third- 
order  polynomial  approximation  (uq,  u^,  U2>  U3)  as  in  equation 
(5.4),  with  uq , u^,  U£,  U3  being  n-bit  numbers  each,  with  almost  the 
same  quantization  error  as  if  the  complex  number  were  quantized  in 
only  two  dimensions  with  a 2n-bit  real  part  x and  a 2n-bit  imaginary 
part  y. 

Taking  into  account  that  the  polynomial  equation  (5.4)  is 
defined  in  terms  of  z,  such  that  z4  = e j n = -1,  it  is  obvious  that 
the  product  of  two  such  polynomial  representations  is  considered  mod 
(z4  +1).  As  a result,  the  PRNS  described  in  Chapter  4 can  perform 
such  a polynomial  product  mod  (z4  + 1)  in  a totally  parallel  manner 
using  only  four  real  multiplies  and  no  adds.  The  fact  that  each  one 
of  the  four  parallel  paths  has  now  half  the  wordlength  that  it  had 
before,  implies  that  smaller  address-space  tables  can  be  used  in  the 
memory  lookup  designs  which  in  turn  become  simpler  and  faster. 

5. 1.2.1  Tlie  4th-order  PRNS  based  on  polynomial  product  modulo 
x4  + 1 in  Zm  ~ 

Since  the  product  of  two  complex  numbers  can  now  be  performed  as 
a product  of  two  third  order  polynomials  modulo  x4  + 1,  the  4th- 
order  PRNS  forward  and  inverse  isomorphic  mappings,  f^  and  f^-l  must 
be  presented  now.  We  will  keep  our  modulus  m to  be  a prime  number 
P- 

From  Corollary  4.1,  the  congruence  x4  s -1  modp  has  four 
distinct  solutions  in  Zp  if  and  only  if  our  modulus  is 
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p = 8 • k + 1 (5.5) 

These  solutions  are  unique  and  the  unique  factorization  of  x4  + 
1 in  Zp  is 

x4  + 1 s (x  - rg)  (x  - rj)  (x  - r2>  (x  - r3>  modp  (5.6) 

But  according  to  Theorems  4.20  and  4.21,  since  the  order  of  the 
congruence  is  four  which  is  an  even  number  and  rg  satisfies  x4  + 1 = 
0 modp,  then  -rg,  rg-3  and  -rg-^  should  also  be  solutions  of  the 
same  congruence.  So  the  four  solutions  are  rg,  -rg,  rg_l,  -rg-'*'  and 
x4  + 1 can  be  written  as 

x4  + 1 = (x  - rg)  (x  + rg)  (x  - rg-1)  (x  + rg-1)  modp  (5.7) 

Finally  from  Theorem  4.12  rg-3  s -rg3  and  therefore 

x4  + 1 = (x  — rg)  (x  + rg)  (x  + rg3)  (x  - rg3)  (5.8) 

If  A(x)  = ag  + a^x  + a2X2  + a3X3  is  a third  order  polynomial 
with  coefficients  ag,  aj_,  a2>  a3  £ Zp,  then  the  4th  order  PRNS 
mapping  f^  which  mapps  (ag,  aj,  a2,  a^)  onto  (ag*,  a^*,  a2*,  a3*)  is 
given  by  equation  (4.7),  which  in  this  case  becomes 

ag*  = A(x)mod(x+rg)  = A(rg)  s (ag  - a^rg  + a2rg2  - a3rg3)modp  (5.9) 

a]*  = A(x)mod (x+rg)  = A(-rg3)  = (ag  - a^rg  - a2rg2  - a3rg3)modp 

a2*  = A(x)mod(x-rg3)  = A(-rg)  s (ag  + a^rg3  - a2rg2  + a3rg)modp 

a3*  = A(x)mod(x-rg3)  = A(rg3)  = (ag  + a^rg3  + a2rg2  + a3rg)modp 
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^ The  inverse  4th-order  PRNS  mapping  is  given  by  A(x)  = 

£ a}_*  Qi(x)  (equation  (4.8)),  where  the  Q^(x)s  are  given  by 

i=0 

equation  (4.9). 

In  our  case  we  get  Qq(x)  = 4_1  • (1  + rQ_1x  + r0“2x2  + r0-3x3). 
By  using  Theorem  4.12  (r^-^  = _r^N-k)  ve  fin(j 


Qo(x) 

=22—2  2tq3x  - 2~2rQ2x2 

- 2~2rQX3 

(5.10) 

Ql(x) 

= 2~2  + 2-2r03x  - 2-2r02x2 

+ 2-2rQX3 

Q2(x) 

s 2~2  + 2-2rQX  + 2~2rQ2x2  + 

2~2rQ3x3 

Q3(x) 

= 2_2  _ 2“2rQX  + 2“2rQ2x2  _ 

2-2ro3x3 

The  inverse  mapping  1 maps  (a0*,  ai*,  a2*,  a 3*)  onto  (a0,  a\,  a2, 
83)  where,  according  to  equation  (5.10),  a^,  i = 0,  ...,  3 are  given 
by 


aQ  = [2-2(aQ*  + aj*  + a2*  + ag*)]  modp  (5.11) 

ai  = [2“2rQ3(ai*  _ + 2_2rg(a2*  - 83*)]  modp 

a2  = [2“3rQ3(a3*  + a2*  - a^*  - ag*)]  modp 
a3  = [2~2rQ(a3*-ag*)  + 2~3rQ3(a2*  - 33*)]  modp 

Example  5.1:  Perform  in  Z37  the  product  A(x)  • B(x)  mod  (x^  + 1) 

where  A(x)  s 5 + 6x  + 8x2  + 13x3  and  B(x)  s 9 + 14x  + 10x2  + 12x3. 

Here,  our  modulus  choice  is  the  prime  number  p=  17  =8  -2+1. 
Then,  by  using  equation  (5.5),  the  congruence  x^  + 1 = 0 mod  17  has 
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four  distinct  and  unique  solutions  in  Zy) . One  solution  is  tq  ; 2 
and  the  others  are  -r0  s -2,  r0_1  s 9 and  -r0_1  2-9.  So  + 1 = 
(x  - 2)  (x  + 2)  (x  - 9)  (x  + 9)  mod  17. 

The  PRNS  mapping  maps  A(x)  and  B(x)  onto  (ag*,  a^*,  a2*,  ag*) 
and  (bg*,  b^*,  b2*,  b3*)  respectively,  where  a^*,  b^*,  i = 0,  3 
are  given  by  equation  (5.9)  which  becomes 


a0* 

— 0 

b0* 

2 3 

(5.12) 

a^* 

2 6 

b]_* 

2 10 

a2* 

2 1 

b2* 

2 3 

a3* 

2 13 

b3* 

2 3 

The 

polynomial  product 

coef f i 

Lcients  c^*,  i = 0,  . . 

. . , 3 are  given 

by 

<4*  2 (ai*  • bi*)  mod  17,  i = 0,  ...,  3 (5.13) 

or 

eg*  = 0,  cj*  h 9,  C2*  = 3,  C3*  2 5 

The  inverse  mapping  f 4"1  maps  (eg*,  cj*,  c2*,  C3*)  onto  C(x)  = 
eg  + c^x  + c2x2  + C3X3  where  C},  i = 0,  ...,  3 are  given  by  (5.11). 
In  this  case  2-2  s 13  and  rg  s 2 so  eg  h 0,  03  2 0,  c2  2 16,  c3  2 9 
and  the  product  of  the  polynomials  mod  (x^  + 1)  in  Z37  is  (5  + 6x  + 
8x2  + 13x3)  • (9  + 14x  + 10x2  + 12x3)  mod  (x^  + 1)  = 16x2  + 9x3. 

To  double  check  the  above  result,  multiply  the  two  polynomials 
directly. 
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5. 1.2.2  Implementation  issues  for  complex  arithmetic  using  a 4th- 
order  SM/PRNS  ~ 

In  section  5.1.1  the  problem  of  implementing  complex  arithmetic 
units  was  examined  using  a SM/QRNS  with  2n-bit  data  bus.  Games 
developed  an  algorithm  by  which  a complex  number  x + jy,  with  x and 
y being  2n-bit  numbers,  can  be  approximated  by  a third-order 
polynomial  uq  + u^z  + U£Z2  + U3Z^  with  uq,  u^,  U£,  U3  being  n-bit 
numbers.  Such  an  approximation  does  not  deteriorate  the  statistical 
quantization  error  behavior.  It  is  meaningful  to  say  that  the 
SM/QRNS  presented  in  the  previous  section  and  which  had  two  parallel 
channels  (one  real  and  one  imaginary)  each  one  being  2n-bits  wide, 
can  now  be  substituted  by  a SM/PRNS  of  order  four,  with  four 
polynomial  channels  each  one  being  n-bits  wide.  Our  SM/PRNS  would 
now  accept  an  n-bit  modulus  choice. 

Several  SM/PRNS  implementation  issues  for  complex  arithmetic 
will  be  briefly  presented  now.  The  two  cases  of  table  lookup  design 
as  well  as  the  conventional  hardware  realization  will  be  discussed. 
Our  Single  Modulus  PRNS  system  will  assume  an  n-bit  data  bus  as  well 
as  an  n-bit  modulus  choice.  Comparisons  between  an  n-bit  SM/PRNS  of 
order  four  and  its  corresponding  2n-bit  SM/QRNS  will  be  performed. 
The  two  systems  will  be  compared  in  terms  of  hardware  and 
throughput . 

Case  (i):  Table  lookups. 

In  Figures  5.5  and  5.6  the  table  lookup  implementations  of  the 
forward  and  inverse  PRNS  mappings  of  order  four  (f4,  f^1), 
described  by  equations  (5.9)  and  (5.11)  are  shown. 
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an  a 


a?  a? 


a3*=(a0-a2ro2)+(airo3+a3rO) 


a2*“<aO-a2rO2)+<-alr03-a3r0) 


Figure  5.5  Table  lookup  implementation  of  the  4th-order 
PRNS  mapping 
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aQ 


* aj*  a3* 


table 


2'2(a0*+a1*) 


table 


2“2(a2*+a3*) 


— > 
> 


add 

table 


— >ao=2-2(aQ*+ai*+a2*+a3*) 


table 


2_Zro3<ai*-ao*) 


table 


2_2ro(a2*-a3*) 


— > 


add 

table 


->a1=2-2r03(a1*-ao*)+2_2r0(a2*-a3*) 


table 


-2"2ro2(aO*+al*) 


table 


2_2r02(a2*+a3*) 


I— > 
— > 


add 

table 


— >a2-2-2rQ2(a2*+a3*-ag*-ai*) 


table 

2_2ro(ai*-ao*) 

-> 

table 


2_2r03(a2*-a3*) 


— > 


add 

table 


— >a3=2-2rQ(a1*-ao*)+2_2r03(a2*-a3*) 


Figure  5.6  Table  lookup  implementation  of  the  4th-order  PRNS 
inverse  mapping 
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It  is  clear  that  the  forward  PRNS  mapping  requires  eight 
tables  in  two  levels.  Based  on  an  n-bit  modulus  choice  (data  lines 
n-bit  wide)  a hardware  investment  of  22n  x n bits  per  memory  table 
is  required  which  amounts  to  a a total  of  8 x 22n  = 22n+3  x n bits 
for  the  entire  PRNS  mapping.  The  inverse  PRNS  mapping  f^-*  requires 
twelve  such  tables  in  two  levels  or  a total  investment  of  12  x 22n  x 
n bits.  Both  the  complex  add  and  the  complex  multiplication  in  the 
4th-order  SM/PRNS  require  four  table  lookup  operations  modulo  m,  or 
4 x 22n  x n = 22n+2  x n bits  of  memory.  Table  5.5  contains  the 
memory  requirements  for  a 4th-order  SM/PRNS  Complex  Arithmetic 
system  using  an  n-bit  modulus. 

Table  5.5  Memory  requirements  for  a 4th-order  SM/PRNS  Complex 
Arithmetic  unit  based  on  an  n-bit  modulus 


Function 

Memory  Size  in  Bits 

Mapping 

22n+3  x n 

Complex  Adder 

22n+2  x n 

Complex  Multiplier 

22n+2  x n 

Inverse  Mapping 

12  x 22n  x n 

Table  5.6  provides  the  ratios  of  the  memory  requirements  of  a 
SM/QRNS  unit  based  on  a 2n-bit  data  bus  and  its  corresponding  4th- 
order  SM/PRNS  which  is  based  on  an  n-bit  data  bus  (Tables  5.1  and 
5.5).  We  can  see  that  the  decomposition  of  a complex  number  into 
four  channels  of  halved  wordlength  and  the  processing  using  the 
four-channel  SM/PRNS  instead  of  the  two-channel  SM/QRNS  result  in 
great  savings  of  memory  space. 
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Table  5.6  Ratios  of  memory  requirements  of  a 2n-bit 
SM/QRNS  to  an  n-bit  4th-order  SM/PRNS 


Function 

SM-QRNS 

SM-PRNS 

Mapping 

22n-l 

Complex  Addition 

22n 

Complex  Multiplication 

22n 

Inverse  Mapping 

22n+2/12 

As  an  example  let  us  compare  the  memory  requirements  of  a 4-bit 
4th-order  SM/PRNS  with  a prime  modulus  p = 17  to  its  corresponding 
8-bit  SM/QRNS  with  a prime  modulus  p = 257.  Observe  that  p = 257 
requires  a 9-bit  system  which  can  easily  be  compressed  to  an  8-bit 
one,  [Tay85a].  Similarly,  the  5-bit  system  based  on  p = 17  can  be 
compressed  to  a 4-bit  one.  Furthermore,  since  the  modulus  p = 17  = 
8 • 2 + 1 is  of  the  form  8 • k + 1 then  the  4th-  order  PRNS  mapping 
exists,  while  since  p = 257  = 4 • 64  + 1 is  of  the  form  4k  + 1 the 
QRNS  mapping  exists.  Table  5.7  summarizes  the  memory  requirements 
for  the  two  systems  described. 

In  addition  to  the  advantages  in  memory  savings  that  the  4th 
order  PRNS  provides  against  the  corresponding  QRNS,  there  is  another 
important  point  to  be  considered.  Suppose  that  both  the  4th-order 
PRNS  and  QRNS  systems  have  the  same  n-bit  wordlengths.  Assume  that 
both  systems  can  be  configured  as  pipelined  processors,  where  2n-bit 
address-space  tables  are  used.  Then  the  throughput  of  these  systems 
will  be  the  same.  Since  an  n-bit  4th-order  PRNS  is  equivalent  in 
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Table  5.7  Comparison  of  memory  requirements  between  an  8-bit 
SM/QRNS  and  its  corresponding  4-bit  SM/PRNS 


Mapping 

Adder 

Multiplier 

Mapping-^ 

SM  QRNS 

1Mb 

1Mb 

1Mb 

1Mb 

SM  PRNS 

8Kb 

4Kb 

4Kb 

12Kb 

% savings 

99.2 

99.6 

99.6 

98.8 

- QRNS  p=257 , 8 bits 

- PRNS  p=17 , 4 bits 


terms  of  quantization  error  with  a 2n-bit  QRNS  system,  clearly  the 
4th-order  PRNS  can  provide  more  precision  than  a QRNS  while 
maintaining  the  same  throughput. 

Case  (ii):  Conventional  hardware. 

Again  in  this  case  we  will  consider  an  n-bit  SM/PRNS  system.  To 
maintain  our  hardware  design  elegance,  our  modulus  should  be  chosen 
to  be  one  of  pj  = 2n  - 1,  p2  = 2n  or  p3  = 2n  + 1 . For  the  4th-order 
PRNS  isomorphic  mapping  to  exist,  the  modulus  choice  must  satisfy  p 
= 8 • k + 1 (equation  (5.5)).  Equating  pj  = 2n  - 1 to  p = 8 • k + 1 


we  get  2n  - 1 

= 8 • k 

+ 1 or 

k = 2n  ^ _ 1/4,  obviously  not 

an 

integer,  and 

therefore 

Pi  is 

inadmissible.  For  p2 

= 2n, 

the 

multiplicative 

inverse 

of  four 

(i.e.  2-2)  does  not 

exist 

and 

therefore  the 

inverse 

mapping 

f4-!  (equation  (5.11)) 

cannot 

be 

defined.  Finally,  the  choice  p3  = 2n  + 1 gives  2n  + l = 8*k  + l 
or  k = 2n-3,  which  is  integer.  Therefore,  p = 2n  + 1 is  the  best 
modulus  choice  for  our  SM/PRNS. 
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Our  Single-Modulus  PRNS  system  should  have  the  ability  to 
perform  the  PRNS  isomorphic  mapping  f^,  the  inverse  mapping  f^-l, 
complex  addition  consisting  of  four  real  additions  modulo  p and 
complex  multiplication  also  consisting  of  four  real  multiplications 
modulo  p.  Equations  (5.9)  and  (5.11)  show  that  the  forward  and 
inverse  PRNS  mappings  call  for  rQ,  r02,  rQ3,  2~2,  2“2  • rQ,  2-2  • 
r02  and  2~2  • r03.  Since  rQ  is  a root  of  x * a -1  modp  with  p = 2n  + 
1>  then  h -1  mod  2n  + 1 a 2n  and  tq  = 2n2^.  Obviously,  n must 

be  a multiple  of  four.  For  2~2  we  get  that  2~2  a (2-1)2  a (2n_1  + 

l)2,  (since  p = 2n  + 1 and  from  equation  (5.2)).  So  2-2  = (2n_^  + 

1)2  = (2n-l)2  + 2 • 211-1  + 1 a 22n_2  + 2n  + 1 a 22n"2.  Equati 

(5.14)  provides  expressions  of  all  the  required  variables. 


on 


r0  ; 

= 2n/4 

r02 

a 2n/2 

ro3 

= 23n/4 

2-2 

a 22n_2 

2~2 

• ro  * : 

2(n/4)-2 

2-2 

III 

CM 

O 

U 

2(n/2)-2 

2-2 

III 

CO 

o 

u 

2(3n/4)-2 

(5.14) 


Obviously,  all  the  entries  of  equation  (5.14)  can  be  realized  with 
binary  shifts  modulo  p. 

The  Conventional  Hardware  implementations  of  the  forward  and 
inverse  4th-order  PRNS  mapping  f^  and  f^-1,  given  by  equations  (5.9) 
and  (5.11),  are  shown  in  Figures  5.7  and  5.8. 
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longest  path 

- (a0ta2r02) 

+ (-3^0-33^3) 

- (aO+a2r02) 

+ (air0+a3r03) 

» (a0-a2r02> 

+ (airo^*'a3ro) 

= (a0'a2r02) 

+ (-air03-a3r0) 
longest  path 


Figure  5.7  Conventional  hardware  implementation  of  the 
4th-order  PRNS  mapping 
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Figure  5.8  Conventional  hardware  implementation  of  the  4th- 
order  PRNS  inverse  mapping 
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Consulting  Table  5.3  and  Figures  5.7  and  5.8,  propagation  delays 
are  obtained  for  the  4th-order  PRNS  mapping,  complex  addition, 
complex  multiplication,  and  PRNS  inverse  mapping  and  they  are  shown 
in  Table  5.8. 

Table  5.8  Propagation  Delay  requirements  for  a 4th-order  SM/PRNS 
Complex  Arithmetic  unit  based  on  a modulus  p = 2n  + 1 


Function 

Propagation  Delay 

Mapping 

[21  + (3n/4) ] A 

Complex  Addition 

8 A 

Complex  Multiplication 

( 7n  - 5)  A 

Inverse  Mapping 

[19  + (3n/4) ] A 

Comparing  the  propagation  delay  requirements  of  the  SM/QRNS  unit 
based  on  a modulus  p = 2^n  + 1 to  its  corresponding  4th-order 
SM/PRNS,  which  is  based  on  a modulus  p = 2n  + 1 (Tables  5.4  and 
5.8),  one  finds  that  the  PRNS  requires  less  propagation  delay  for 
the  inverse  mapping  and  for  the  complex  multiplication  than  its 
corresponding  QRNS,  for  any  wordlength  n.  The  delay  for  the  complex 
addition  is  the  same  for  both  systems.  The  delay  for  the  forward 
mapping  is  larger  in  the  PRNS  for  values  of  n < 32,  the  same  in  both 
systems  for  n = 32  and  larger  in  the  QRNS  for  n > 32.  It  is  clear 
that  for  multiplicative  intensive  environments,  the  PRNS  outperforms 
the  corresponding  QRNS  in  terms  of  speed. 

Consider  the  following  example  of  computing  a 16-point  complex 
circular  convolution.  The  two  complex  sequences  to  be  convolved  are 
G = gQ,  g^,  • • • , gi5  and  H = hQ,  h^,  . . . , 


h15  where  gi?  hi  e <p  but 
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have  been  quantized  over  four  parallel  channels,  using  equation 
(5.4).  Call  I = G H H to  be  the  circular  convolution  of  G and  H. 
It  is  known  that 

I\  = £ gihj , X = 0,  1,  . . . , 15 

j 

i+j=Xmodl6 

Such  a computation  of  lx,  X = 0,  1,  ...,  15  requires  256  complex 
multiplications  and  240  2-operand  complex  additions.  Assuming  a 
totally  serial  implementation  of  the  above  circular  convolution, 
Table  5.9  provides  the  total  propagation  delay  for  both  the  PRNS  and 
QRNS  realizations.  Here  we  assume  a 4-bit  PRNS  and  an  8-bit  QRNS. 

Table  5.9  Propagation  delay  requirements  for 

16-point  complex  circular  convolution: 

(4-bit  PRNS  versus  8-bit  QRNS) 


SM/PRNS 

SM/QRNS 

Mapper 
(32  maps) 

768  A 

544A 

Multiplier 
(256  multipliers) 

5888A 

13056A 

Adder 
(240  adds) 

1920A 

1920A 

Mapper-^ 

(16  maps-!) 

352A 

400A 

Total 

8928  A 

15920A 
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Clearly,  Table  5.9  shows  that  the  PRNS  is  almost  twice  as  fast 
as  the  corresponding  QRNS,  assuming  that  no  scaling  is  involved  in 
either  one  of  the  systems.  However,  the  price  paid  is  a higher 
modular  complexity  for  the  PRNS  because  of  more  modular  units  now 
required  by  the  multiple  channels. 

As  a final  conclusion,  it  is  now  clear  that  for  small-wordlength 
systems  using  table  lookups,  the  PRNS  offers  substantial  savings  in 
memory  requirements  over  the  QRNS.  For  larger-wordlength  systems, 
where  the  conventional  designs  are  more  attractive  because  of 
address  space  limitations  of  high-speed  semiconductor  memory 
devices,  the  PRNS  can  provide  significant  speed  improvements  over 
the  QRNS,  especially  for  multiplicative  intensive  environments. 

5. 1.2. 3 Weaknesses  of  the  complex  PRNS 

The  previous  discussion  of  complex  arithmetic  implemented  in  the 
4th-order  SM/PRNS  showed  that  decomposition  of  a complex  number 
(with  2n-bit  real  part  and  2n-bit  imaginary  part)  into  four 
polynomial  channels  using  Games'  Algorithm  (where  each  polynomial 
channel  is  n-bits  wide),  results  in  great  savings  of  memory  space 
and  significant  speed  improvements. 

Unfortunately,  Games'  algorithm  is  a current  drawback  of  the 
PRNS  design  since  it  is  a non-systematic  and  slow  process.  Unless  a 
better  algorithm  can  be  found  that  decomposes  a complex  number  into 
four  polynomial  channels  with  almost  the  same  quantization  error,  or 
a suboptimal  one  that  trades  quantization  error  for  speed,  it  would 
not  be  possible  to  really  take  full  advantage  of  the  PRNS. 
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Another  important  weakness  is  that  up  to  this  moment  there  has 
been  found  no  efficient  scaling  technique  in  the  Single-Modulus 
Complex  PRNS . Consider  the  following  two  ways  of  decomposing  a two- 
channel  2n-bi t complex  SM/QRNS  into  four  channels  of  n-bit  each: 

(i)  Using  a Double-Moduli  QRNS  with  moduli  set  {p1(  p2}  where  p1; 
p2  are  n-bit  wide. 

(ii)  Using  the  already  discussed  4th-order  SM/PRNS  with  one  n-bit 
modulus  p. 

Using  the  general  scaling  policy  (Chinese  Remainder  Theorem)  scaling 
by  a scale  factor  k can  be  mechanized  for  the  above  two  cases  using 
table  lookups  and  is  shown  in  Figures  5.9  and  5.10. 

In  Figure  5.9  each  one  of  the  inverse  QRNS  mappers  calls  for  two 
tables  (Figure  5.2),  each  one  requiring  22n  x n bits  (a  total  of  4n 
x 22n  bits  of  memory).  The  Chinese  Remainder  Theorem  (CRT) 
mechanizes  the  following  two  equations: 

X = (xj  • Mj  • + x2  • M2  • N2)modm  (5.15) 

Y = (y^  • M^  • N]^  + x2  • M2  • N2)modm 

where  m = Pl  • p2,  Mi  = m/pi,  i . 1,  2,  Nj  a M^1  modpi- 

Each  one  of  the  two  equations  (5.15)  require  a (22n  x 2n)-bit 
table,  or  a total  of  4n  x 22n  bits  of  memory  is  needed  for  the  CRT. 
For  the  modpi,  modp2  operations  the  requirement  is  4n  x 22n  memory 
bits,  while  each  QRNS  mapping  f2  uses  two  (22n  x n)-bit  memory 

tables  (a  total  of  4n  x 22n-bits  of  memory).  Obviously,  every  time 
a general  scaling  procedure  is  called  in  the  Double-Moduli  QRNS  16n 
x 22n  memory  bits  are  being  used  in  a totally  parallel  architecture. 
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z1(n) 


z1*(n) 
/ > 


Z2(n) 
/ > 

z2*(n) 
/ > 


Figure 


u0*(n) 
> 


u1*(n) 
> 


u2*(n) 
> 


u3*(n) 
> 


Figure 


5.9  General  scaling  procedure  (scale  factor  k)  in  a 
Double-Moduli  QRNS  based  on  table  lookups 


5.10  General  scaling  procedure  (scale  factor  k)  in  a 
SM/PRNS  based  on  table  lookups 
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For  the  case  of  a 4th-order  SM/PRNS  approach  (Figure  5.10)  the 
inverse  PRNS  mapping  f ^ requires  (in  the  worst  case)  twelve  (22n  x 
n)-bit  memory  tables  (Figure  5.6),  while  the  forward  mapping  f^ 
requires  eight  (22n  x n)-bit  memory  tables  (Figure  5.5),  and  a total 
investment  of  20n  x 22n  memory  bits  is  needed  per  scaling  call. 
Since  a Single-Modulus  PRNS  approach  is  based  on  the  use  of  only  one 
modulus,  all  the  tables  are  identically  preprogrammed,  while  in  a 
Double-Moduli  QRNS  system  some  tables  perform  modp^  and  some  perform 
modp2  operations.  Therefore,  SM/PRNS-based  designs  are  more 
attractive  than  the  corresponding  Double-Moduli  QRNS  ones. 
Nevertheless,  the  SM/PRNS  would  not  really  outperform  the  Double- 
Moduli  QRNS  before  a more  efficient  scaling  policy  is  found  in  the 
case  of  Complex  Arithmetic.  A meaningful  and  attractive  scaling 
policy  would  be  one  that  would  make  less  frequent  use  of  the  general 
scaling  hardware  unit  of  Figure  5.10.  Such  a case  will  be  presented 
in  the  next  section  of  this  chapter,  where  a SM/PRNS  unit  performs 
the  real  arithmetic  operation  (A  • B)  modp,  where  A,  B e Zp.  In 
that  particular  application  the  Single  Modulus  nature  of  the  system 
requires  that  (for  scaling  purposes)  the  Polynomial  CRT  (fN-1)  and 
forward  mapping  f^j  be  only  used  once  at  the  very  end  of  the 
processing. 


5.2  PRNS  Real  Arithmetic  ALUs 


5.2.1  Polynomial  Decomposition  of  the  Product  (A  • B)  mod  (2n  + 1) 
In  this  section,  another  interesting  application  of  the  PRNS  is 
presented.  It  is  the  decomposition  of  the  product  (A  • B)  modm,  in 
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polynomial  channels  where  m = 2n  + 1 and  A,  B e Zm.  The 
presentation  is  based  on  a Single-Modulus  4th-order  PRNS  but  it  can 
be  generalized  to  a PRNS  of  any  higher  order. 

Without  loss  of  generality  assume  that  n = X • 4 (multiple  of 
four).  Our  modulus  choice  m = 2n  + 1 implies  an  (n  + l)-bit  system, 
which  can  be  compressed  to  an  n-bit  one,  as  discussed  previously. 
Our  goal  is  to  decompose  (A  • B)  modm  into  more  than  one  channels 
each  one  having  a wordlength  nj^  where  n!  < n,  in  order  to  obtain 
hardware  savings  and  speed  increase.  One  way  to  do  this  is  by 
choosing  a set  of  moduli  P = {p^,  p2>  . ..,  Pl)  such  that  m = p^  • 
...  • pL,  spaning  the  same  dynamic  range  as  the  original  modulus  m. 
Such  a solution  will  result  in  a multi-moduli  approach  with  the  well 
known  disadvantages  in  terms  of  magnitude  scaling  and  magnitude 
comparison. 

Here,  an  effort  has  been  made  to  decompose  the  modulo  m channel 
into  more  than  one  channels  each  channel  having  a smaller  wordlength 
than  the  original  one,  while  all  the  channels  are  using  the  same 
modulus.  The  tool  which  has  been  used  is  the  Single-Modulus  PRNS. 
Besides  this,  an  efficient  scaling  technique  has  been  found  in  the 
SM/PRNS , which  uses  the  Polynomial  Chinese  Remainder  Theorem, 

( ^N ) » only  once;  that  is  at  the  very  end  of  the  processing. 

Consider  the  n-bit  integer  numbers  A and  B.  If  A is  given  by 

A = a0  + a^1  + a222  + ...  + an_12n~1  (5.16) 
and  n is  a multiple  of  four,  then  A can  be  written  in  a base  of  2n ^ 


form  as  follows: 


129 


A - (apa!  ...  a(n/4)_1)2°  + (a(n/4)+1  ...  a(n/2)_i)  2n/4  +(5.17) 

(an/2  a(n/2)+l  •••  a(3n/4)-l )2n/2  + 

(a3n/4  a(3n/4)+l  •••  an-l)23n/4 
or 

A = Ag  • 2°  + Ai  • 2n/4  + A2  • 2n/2  + A3  • 23n/4  (5.18) 

where 

A0  = a0  ' 2°  + ai  • 21  + ...  + a(n/4)_1  • 2<n/4>-1  (5.19) 

A1  = an/4  ' 2°  + a(n/4)+l  * 21  + •••  + *(n/2)-l  * 2<n/2)-1 

A2  = an/2  * 2°  + a(n/2 ) +1  * 21  + •••  + a(3n/4)-l  * 2(3n/4)-1 

A3  = a3n/4  • 2°  + a(3n/4)  + l • 21  + . . . + an_1  • 211-1 

and  a^  s {0,  1),  i = 0,  1,  ...,  n-1.  So  Aq,  Aj_,  A2,  A3  are  n/4-bit 

numbers  (bytes) . Similar  is  the  situation  for  the  n-bit  integer  B, 
which  can  be  written  in  byte  form  as 

B = Bq  • 2°  + B3  • 2n/4  + B2  • 2n/2  + B3  • 23n/4  (5-20) 

The  product  (A  • B)  modm  where  m = 2n  + 1 can  be  written  now 
(taking  into  account  equations  (5.18)  and  (5.20)),  as 

(A-B)mod(2n+l)  = (Ao-2®+A3-2n/4+A2n/2+A3*23n/4) • (5-21) 


(B0-2°+B1-2n/4+B2-2n/2+B3-23n/4)mod(2n+l) 
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= (B0-A0)mod(2n+l)  + (B0-Ar2n/4)mod(2n+l)  + 
(B0-A2-2n/2)mod(2n+l)+(B0-A3-23n/4)mod(2n+l) 

+ (BrA0-2n/4)mod(2n+l)  + (BrAr2n/2)mod(2n+l)  + 

( Bi  • A2  • 23ri//4)mod(2n+l)  + (Bi  ■ A3*  2n)mod(2n+l) 

+ (B2-Ao-2n/2)mod(2n+l)+(B2-A1-23n/^)mod(2n+l)+ 

(B2 • A2 • 2n)mod(2n+l)+(B2 • A3 • 23n^^)mod(2n+l) 

+ (B3*AQ*23n44)mod(2n+l)+(B3-Ai  • 2n)mod(2n+l)  + 

(B3-A2-25n/4)mod(2n+l)+(B3-A3-26n/4)mod(2n+l) 

Since  2n  mod  (2n  + 1)  = -1,  then  25n/4  mod  (2n  + 1)  = -2n/4  and 
26n/4  mod  (2n  + 1)  = -2n/2.  The  product  (A  • B)  mod  (2n  + 1)  can  be 
written  as 

(A*B)mod(2n+l)  = (B0-A0-2°)mod(2n+l)  + (B0-Ar2n/4)mod(2n+l)  + 
(B0-A2-2n/2)mod(2n+l)+(B0-A3-23n/4)mod(2n+l) 

- (BrA3-2°)mod(2n+l)  + (BrA0-2n/4)mod(2n+l)  + 


(B^ -Aj • 2n/2)mod(2n+l)+(B^ •A2*23n44)mod(2n+l) 
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- (B2-A2-2°)mod(2n+l)-(B2'A3-2n/4)mod(2n+l)+ 


(b2"Ao*  2n/2)m°d(2n+l)  + (B2' Ai • 23n/4)mod(2n+l) 


- (B3-A1-20)mod(2n+l)-(B3-A2'2n/4)mod(2n+l)- 


(B3-A3-2n/2)mod(2n+l)+(B3-A0-23n/4)mod(2n+l) 


or 


(A*B)mod(2n+l)  = (Bq-Aq-B!^^^^^)  •2°mod(2n+l)  (5.22) 


+ (Bq • A3+B3 • Aq-B2 • A3-B3 • A2 ) • 2n//4mod(2n+l ) 


+ (Bq*  A2+B3  ’A3+B2  • Aq-B3*  A3)  • 2n/,2mod (2n+l ) 


+ (Bq- A3+B3  *A2+B2 • A3+B3*Aq) * 23n44mod(2n+l) 


or 


(A*B)mod(2n+l)  = [ (C0mod(2n+1) ) • 2°]mod(2n+l)+[ (C3mod(2n+l) ) (5.23) 


2n44]mod(2n+l)+[ (C2mod(2n+l) ) • 2n42 ]mod(2n+l)+ 


[ (C3mod(2n+l) ) • 23n/4]mod(2n+l) 


where 
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C0  = (®0 ' ^0~®1 • A3-B2 • A2-B3 • )mod(2n+l ) (5.24) 

Ci  = (BQ-Ai+Bi>Ao-B2,A3-B3>A2)mod(2n+l) 


C2  = (BQ,A2+Bi-Ai+B2,AQ-B3*A3)mod(2n+l) 


C3  = (Bo‘A3+Bi ‘A2+B2- Ai+B3*AQ)mod(2n+l) 

A closer  observation  of  Ci#  i = 0,  . ..,  3 shows  that  Ci  are  such 
that  they  satisfy  the  following  polynomial  product  modulo  (x4  + 1) 
in  Z2n+1 

A(x)-B(x)mod(x4+l)  = (Ao+A1x+A2x2+A3X3)  • (Bo+B1x+B2x2+B3X3)  (5.25) 

mod(x4+l)  = Cq+Cix+C2X2+C3X3 

Since  each  Cj , i = 0,  . ..,  3 is  Cj  = I A^  • B^  and  each  Apr,  B^ 

4 terms 

is  an  n/4-bit  representation,  then  i = 0,  ...,  3 requires  ni  = 
n/2  + 2 bits  to  be  represented.  This  simply  implies  that 

n 

Ci  mod  (2n  + 1)  = Ci  mod  (2nl  + 1)  with  nj  = _ + 2 (5.26) 

2 

As  a result,  the  product  (A  • B)  mod  (2n  + 1)  in  equation  (5.23) 
can  be  written  as 

(A-B)mod2n+l  = [(C0mod(2nl+l))-20+(C1mod(2nl+l))-2n/4+  (5.27) 


(C2mod(2nl+l) ) • 2n/2+(C3mod(2nl+l ) ) • 23n/4]mod(2n+l) 
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Therefore,  the  computation  of  the  product  (A  • B)  mod  (2n  + 1) 
can  be  performed  through  the  following  algorithm. 

Algorithm: 

(i)  Decompose  A and  B as 

A = A0  • 2°  + A1  • 2n/4  + A2  ■ 2n/2  + A3  • 23n/4 

B = B0  • 2°  + Bj_  • 2n/4  + B2  • 2n/2  + B3  • 23n/4 

(ii)  Take 

A(x)  = Aq  + A3x  + A2x(i) 2  + A3x3 

B(x)  = Bq  + B3x  + B2x2  + B3x3 

(iii)  Choose  n3  = n/2  + 2 and  consider  p3  = 2nl  + 1 

(iv)  Perform  (A(x)  • B(x)  mod(x4  + 1))  arithmetic  modp!  and 
suppose  that  A(x)  • B(x)  mod(x4  + 1)  = C0  + C3x  + C2x2  + C3x3 
where  Ci  s Zpl,  i = 0,  . . . , 3. 

Then,  the  final  product  (A  • B)  mod  (2n  + 1)  is 

(A-B)mod(2n+l)  = (Cq • 20+0! • 2n/4+C2 • 2n/2+C3 • 23n/4)mod(2n+l ) (5.28) 

Since  the  product  A(x)  • B(x)  mod  (x4  + 1)  is  required  for  the 
computation  of  (A  • B)  mod  (2n  + 1),  this  can  be  performed 

efficiently  using  a 4th-order  SM/PRNS  and  only  four  parallel 
channels.  The  modulus  choice  for  such  a Single-Modulus  PRNS  is  p]_  = 
2nl  + 1 where  n3  = n/2  + 2.  Figure  5.11  shows  the  decomposition  of 
(A  • B)  mod  (2n  + 1)  in  four  polynomial  channels  modulo  (2nl  + 1). 

The  following  two  examples  clearly  demonstrate  the  above 


multiplication  algorithm. 
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Figure  5.11  Decomposition  of  (A  • B)  mod  (2n  + 1)  in  four 
polynomial  channels 
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Example  5.1:  Perform  (A  • B)  mod  (2n  + 1),  where  A = 1408,  B = 2496 
and  n = 12. 

The  bit  forms  of  numbers  A and  B are: 

A = 010110000000  and  B = 100111000000  where  the  right  most  is 
the  least  significant  digit.  In  byte  form,  A and  B are  written  as  A 
= (A0,  Ax,  A2,  A3)  = (0,  0,  6,  2)  = 0 • 2°  + 0 • 23  + 6 • 26  + 2 • 
29  = 1408  and  B = (B0,  Blf  B2,  B3)  = (0,  0,  7 , 4)  = 0 • 2°  + 0 • 23 
+ 7 • 2^  + 4 • 29  = 2496.  Since  n = 12  then  n3  = n/2  + 2 = 8 and 
the  modulus  p3  = 2®  + 1 = 257  must  be  chosen  for  the  performance  of 
the  polynomial  product  A(x)  • B(x)  mod  (x4  + 1).  Such  a modulus 
choice  p3  = 257  allows  the  existence  of  the  4th-order  PRNS 
isomorphic  mapping  f^,  due  to  the  fact  that  pj  = 257  = 8 x 32  + 1 
(equation  (5.5)).  It  also  provides  hardware  advantages  since  it  is 
of  the  form  p3  = 2nl  + 1. 

The  4th-order  PRNS  mapping  f^  maps  (Aq,  A3,  A2,  A3)  and  (Bq,  Bj, 
B2,  B3)  onto  (Aq*  , Aj_*,  A2*,  A3*)  and  (Bq*,  Bj*,  B2*,  B3*) 
respectively,  where  A^*  and  B^*,  i = 0,  ...,  3 are  given  by  equation 
(5.9).  In  this  case,  tq  = 4 and  Aj*,  B^*  are  found  to  be  Aq*  = 224, 
Ax*  = 225,  A2*  = 153,  A3*  = 169,  Bq*  = 111,  Bx*  = 113,  B2*  = 129, 
B3*  = 161.  The  polynomial-product  coefficients  are  given  in  the 
PRNS  by  Ci*  = (Aj*  • Bj*)  modp3,  i = 0,  ...,  3 or  C0*  = 192,  Cj*  = 
239,  C2*  = 205,  C3*  = 224. 

The  inverse  4th-order  PRNS  mapping  f^-1  maps  (Cq*,  C3*,  C2*,  C3) 
onto  C(x)  = Cq  + C3x  + C2x3  + C3x3  where  C^,  i = 0,  ...,  3 are  given 
by  equation  (5.11)  and  they  are 

Cq  = 215  = -42,  C]_  = 219  a -38,  C2  h 249  = -8,  C3  = 0 
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Then  according  to  equation  (5.28)  the  final  product  should  be 

(1408  x 2496)  mod  (212  + 1)  = (-42  x 2°  - 38  x 23  - 8 x 26  + 0 x 29) 
mod(212+l)  = (-42  - 38  x 8 - 8 x 64)  mod4097  = (-858)  mod4097  = 3239 
This  checks  correct  since  (1408  x 2496)  mod  4097  = 3239. 

Exmaple  5.2:  Perform  (A  • B)  mod  (2n  + 1)  with  A = 50,  B = 60,  n = 

12. 

Here  A = (A0,  Alf  A2,  A3)  = (2,  6,  0,  0)  = 2 • 2°  + 6 • 23  = 50, 
B = (Bq,  Blt  B2,  B3)  = (4,  7,  0,  0)  = 4 • 2^  + 7 • 23  = 60  and  the 
product  (A  • B)  mod  (2n  + 1)  is  (50  x 60)  mod  4097  = 3000.  The  4th- 
order  SM/PRNS  should  have  a modulus  P3  = 2nl  + 1 where  n^  = n/2  + 2 
= 8,  so  pj  = 257.  The  PRNS  mapping  f^  maps  (A q,  A3,  A2,  A3)  and 
(B0,  B3,  B2,  B3)  onto  (Aq*,  A3*,  A2*,  A3*)  and  (Bq*,  B3*,  B2*,  B3*) 
respectively  where  Aq*  = 26,  A3*  = 235,  A2*  = 132,  A3*  = 129,  Bq*  = 
32,  B3*  = 233,  B2*  = 70,  B3*  = 195.  The  product  is  given  in  the 
PRNS  by  C3*,  i = 0,  ...,  3 where  C3*  = (A3*  • B3*)  modp3  or  Cq*  = 
61,  C3*  = 14,  C2*  = 245,  C3*  = 226.  The  inverse  mapping  f^-! 
provides  C3,  i = 0,  ..,  3 where  Cq  = 8,  C3  = 38,  C2  = 42,  C3  = 0 and 
the  final  product  is  given  according  to  equation  (5.28)  by  (50  x 60) 
mod  (212  + 1)  = (8  • 2°  + 38  • 23  + 42  • 26  + 0 • 29)  mod  4097  = 
3000,  the  expected  result. 

The  previous  polynomial  decomposition  of  the  product  (A  • B)  mod 
(2n  + 1)  in  four  polynomial  channels  using  a 4th-order  SM/PRNS,  can 
be  generalized  to  a decomposition  in  N polynomial  channels  using  an 
Nth-order  SM/PRNS. 
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Consider  the  two  n-bit  integers  A and  B and  the  product  (A  • B) 
modm  where  m = 2n  + 1.  Assume  that  n = k • N (n  is  a multiple  of 
N).  Then  the  integers  A and  B can  be  written  in  the  following  byte 
form: 


A = A0  • 2°  + A1  • 2n/N  + A2  • 22n/N  + ...  + An_x  • 2(N~1 > ' n/N(5 . 29) 

B = B0  • 2°  + Bi  • 2n/N  + B2  • 22n/N  + ...  + Bn_!  • 2(N"1)'n/N 

where  A^,  B-^,  i = 0,  N - 1 are  n/N-bit  bytes. 

Following  the  same  procedure  as  before,  the  product  (A  • B)  mod 
(2n  + 1)  will  be  expressed  in  terms  of  the  coefficients  of  the 
polynomial  product 

A(x)  • B(x)  mod  (xN  + 1)  = C0  + ^x  + C2x2  + ...  + CN_1xN-1  (5.30) 
where 

A(x)  = A0  + Axx  + A2x2  + ...  + Afyj_1x^_1  (5.31) 

B(x)  = Bq  + B^x  + B2x2  + . . . + Bfvj_jx^-^ 

Since  each  one  of  the  polynomial  coefficients  C^,  X = 0,  1,  ...,  N - 

1 is  expressed  as  = l Ai  • Bj  and  each  Aj,  Bj  is  an  n/N-bit 

N terms  J J 

number,  then  requires  n^  = 2 • n/N  + log2  N bits  to  be 

represented.  For  the  specific  case  n = N,  Aj  and  Bj  are  1-bit 
numbers,  so  is  the  product  Ai  • Bj  and  then  each  has  an  n2-bit 

representation  where  n2  = 1 + log2  N. 

The  following  algorithm  describes  the  decomposition  of  the 
product  (A  • B)  mod  (2n  + 1)  in  N polynomial  channels  using  an  Nth- 


order  SM/PRNS. 
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Algori thm  5.1: 

Consider  two  n-bit  integers  A and  B,  where 

A = a0  + ai  • 21  + a2  • 22  + ...  + a n-1  ■ 2n~l  (5.32) 

B = bg  + bj  • 2-*-  + b2  * 22  + ...  + bn_i  • 2n~ 1 

a i i b^  c {0,  1)>  i = 0 > 1,  . . . f n — 1 

Case  (i);  n = k • N where  k * 1 and  integer. 

1.  Decompose  A and  B in  byte  form  as 

A = Aq  * 2°  + A}  • 2n/N  + A2  • 22n/N  + ...  + AN-1  • 2(n_1) ‘n/N(5.33) 

B = B0  • 2°  + • 2n/N  + B2  • 22n/N  + ...  + Bn_!  • 2(N~1),n/N 

where  A^,  B^  are  n/N-bit  numbers. 

2.  Take 

A(x)  = A0  + Axx  + ...  + A^iX^1  (5.34) 

B(x)  = Bq  + B^x  + . . . + B^_2xN~1 

3.  Choose  an  n^-bit  modulus  pj  where 


n^  = 2 • n/N  + log2  N 


(5.35) 


4.  Perform  (A(x)  • B(x)  mod  (xN  + 1))  arithmetic  modpi  and  suppose 
that 


A(x)  • B(x)  mod  (xN  + 1)  = C0  + Cjx  + ...  + C^x^1 


(5.36) 


139 


where  Cj  e Zp^,  i = 0,  1,  . N - 1.  Then  the  product  (A  • B) 
mod  (2n  + 1)  is 

(A  • B)  mod  2n  + 1 = (Cq  • 2°  + Cj_  • 2n/N  + C2  • 22n/N  + (5.37) 

•••  + Cn-1  ' 2(N-l)*n/Nj  mod  (2n  + 1) 


Case  (ii):  n = N 

1 .  Take 


A(x)  = a0  + axx  + ...  + a^x0"1  (5.38) 

B(x)  = bQ  + b^x  + . . . + bn_ixn~l 

where  a^,  b^  e {0,  1),  i = 0,  . ..,  n - 1 and  are  defined  by 

equation  (5.32). 

2.  Choose  an  n2-bit  modulus  p2  where 

n2  = 1 + log2  n (5.39) 

3.  Perform  (A(x)  • B(x)  mod  (xn  + 1))  arithmetic  modp2  and  suppose 
that 

A(x)  • B(x)  mod  (x11  + 1)  = cq  + cjx  + ...  + cn_1xn~1  (5.40) 

where  c^  s Zp2,  i = 0,  1,  ...,  n - 1.  Then  the  product  (A  • B) 

mod  (2n  + 1)  is 
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(A  • B)  mod  (2n  + 1)  = (c0  • 2°  + cj  • 21  + c2  • 22  + (5.41) 

...  + cn_i  • 2n_l)  mod  (2n  + 1) 

To  get  a feeling  of  the  advantages  obtained  by  decomposing  the 
product  (A  • B)  mod  (2n  + 1)  in  N polynomial  channels,  consider  at 
first  the  case  of  using  table  lookup  techniques.  The  implementation 
of  the  product  (A  • B)  modm  where  m = 2n  + 1,  using  one  modulus 
channel  with  modulus  choice  m,  would  require  2^n  x n bits  of  memory. 
One  way  of  decomposing  the  above  product  in  more  than  one  channels 
of  smaller  wordlength,  is  to  use  a multi-moduli  system.  The  most 
popular  multi-moduli  system  from  a hardware-design-elegance  point  of 
view  (easy  scaling,  easy  residue  to  decimal  conversion)  is  the  so- 
called  3-moduli  RNS,  based  on  moduli  choice  {2^  - 1,  2\  2^  + 1) 
[Tay82a].  To  maintain  the  same  dynamic  range  provided  by  the 
original  modulus  m,  the  wordlength  X of  each  one  of  the  three 
channels  of  our  3-moduli  RNS  system  should  be  X = n/3,  if  n/3  is  an 
integer,  or  X = [n/3]  + 1 if  n/3  is  a non  integer,  where  [•]  denotes 
integer  part.  As  a result,  the  overall  memory  requirements  for  a 
product  (A  • B)  mod  (2n  + 1),  using  the  3-moduli  system,  would  be  3 
x 2^X  x X bits.  Finally,  according  to  Algorithm  5.1,  the 
decomposition  of  the  product  (A  • B)  mod  (2n  + 1)  in  N polynomial 
channels  would  require  N channels  of  wordlength  nj,  where  n^  = 2 • 
n/N  + log2  N;  a total  memory  requirement  of  N x 2^'n//N+21og2N  x (2  • 
n/N  + log2  N)  bits  for  the  case  where  n = k • N,  k t 1 and  integer. 
For  the  case  where  n = N,  we  would  have  n channels  each  one  of 
wordlength  n2  = 1 + log2  n or  a total  memory  requirement  of  n x 
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22+21og2n  x (l  + l0g2  n)  bits.  Table  5.10  shows  the  memory 
requirements  for  table  lookup  implementations  of  the  product  (A  • B) 
mod  (2n  + 1)  using  the  decomposition  technique  described  above. 
Table  5.11  gives  specific  numerical  values  of  the  memory  bits 
required  for  various  values  of  n and  N.  Obviously,  either  one  of 
the  decompositions  of  the  product  (A  • B)  mod  (2n  + 1),  using  a 3- 
moduli  RNS  or  the  Nth-order  SM/PRNS,  results  in  reduction  of  the 
memory  bits  required  for  table  lookup  implementations.  Besides 
this,  for  large  wordlengths  (n  > 40)  and  in  the  case  where  the  order 
of  the  PRNS  is  N > 8,  the  polynomial  decomposition  can  provide 
significant  savings  over  the  3-moduli  RNS  decomposition.  Table  5.11 
shows  that  such  savings  can  vary  between  38%  and  99%. 

For  a conventional  hardware  design  approach,  Table  5.3  shows 
that  the  system  throughput  is  higher,  if  the  channel  wordlength  is 
smaller.  It  can  be  observed  that  for  n > 40  and  N > 8,  the  channel 
wordlength  in  each  one  of  the  N PRNS  channels  is  smaller  than  the 
channel  wordlength  in  each  of  the  3-moduli  RNS  channels.  As  a 
result,  higher  throughputs  can  be  achieved  using  the  polynomial 
decomposition  of  (A  • B)  mod  (2n  + 1).  Tables  5.12  shows  the 
required  channel  wordlengths  for  the  two  decomposition  techniques 
for  n > 40  and  N > 8. 

The  overall  result  is  that  the  product  (A  • B)  mod  (2n  + 1)  can 
be  effectively  computed  using  an  Nth-order  SM/PRNS  in  case  of  n > 40 
and  N > 8.  To  take  full  advantage  of  such  a polynomial 
decomposition,  an  elegant  scaling  policy  for  the  SM/PRNS  must  be 
found,  for  the  case  of  real  arithmetic.  This  is  going  to  be  the 
subject  of  the  next  section. 
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Table  5.10  Memory  requirements  for  table  look-up  implementations 
of  the  product  (A  • B)  modm  with  m = 2n  + 1 . 

[•]  denotes  integer  part. 


Way  of  decomposition 

Memory  requirements  in  bits 

Single  channel  RNS  with 
modulus  choice  m = 2n  + 1 

22n  x n 

3-moduli  RNS  {2X  - 1,  2X, 
2X  + 1}  where  3 | n 

3 x 22n/,3  x n/3 

3-moduli  RNS  {2X  - 1,  2X, 
2X  + 1}  where  3|n 

3 x 22 1 n/3 ] +2  x ( [n/3]  + l) 

SM/PRNS  of  order  N where 
n = k • N,  k / 1 and  integer 

N x 24'n/N+21°g2N  x (2-n/N+log2N) 

SM/PRNS  of  order  N where 
n = N 

n x 22+21°g2n  x (l+log2n) 
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Table  5.11  Memory-bits  requirements  for  the  product  (A  • B) 
mod  (2n  + 1)  for  various  values  of  (n,N) 


(n,N) 

Single  channel 
RNS  with  modulus 
choice  m=2n  +1 

3-moduli 

RNS 

SM/PRNS 

of 

order  N 

PRNS  requiring 
less  memory  than 
3 mod  RNS 

% savings  of 
PRNS  vs. 

3 mod  RNS 

(4,4) 

28x4 

6x24 

12x26 

(8,4) 

216x8 

9x26 

24x212 

(12,4) 

224x12 

12x28 

32x216 

(16,4) 

232x16 

18x212 

40x220 

(20,4) 

240x20 

21x214 

48x224 

(24,4) 

248x24 

24x216 

56x228 

(28,4) 

256x28 

30x220 

64x232 

(32,4) 

264x32 

33x222 

72x236 

(8,8) 

216x8 

9x26 

32x28 

(16,8) 

232x16 

18x212 

56x214 

(24,8) 

248x24 

24x2 16 

72x218 

(32,8) 

264x32 

33x222 

88x222 

(40,8) 

280x40 

42x228 

104x226 

X 

38.09 

(48,8) 

296x48 

48x232 

120x230 

X 

37.50 

(56,8) 

2112x56 

57x238 

136x234 

X 

85.96 

(64,8) 

2128x64 

66x244 

152x238 

X 

96.40 

(16,16) 

232x16 

18x212 

80x210 

(32,16) 

264x32 

33x222 

128x216 

X 

93.90 

(48,16) 

296x48 

48x232 

160x220 

X 

99.91 

(64,16) 

2128x64 

66x244 

192x224 

X 

99.99 

(32,32) 

264x32 

33x222 

192x212 

X 

99.43 

(64,32) 

2128x64 

66x244 

288x218 

X 

99.99 
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Table  5.12  Channel  wordlengths  for  the  two  different 

decomposition  techniques  of  (A  • B)  mod  (2n  + 1) 


(n,N) 

Width  in  bits  per  channel  in 
a 3-moduli  RNS  decomposition 

Width  in  bits  per  channel  in  a 
SM/PRNS  of  order  N decomposition 

(40,8) 

14 

13 

(48,8) 

16 

15 

(56,8) 

19 

17 

(64,8) 

22 

19 

(16,16) 

6 

5 

(32,16) 

11 

8 

(48,16) 

16 

10 

(64,16) 

22 

12 

(32,32) 

11 

6 

(64,32) 

22 

9 
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5.2.2  A Scaling  Technique  for  the  Single  Modulus  Real  Arithmetic 
PRNS  Units 

An  elegant  scaling  policy,  based  on  the  SM/PRNS  as  a processing 
system,  will  be  presented  now  for  sum-of-products  type  of 
computations  ( E Aj  • B j ) mod  (2n  + 1),  where  Ai?  Bj  e z2n+l- 
Unfortunately  this  scaling  policy  is  not  general.  It  is  applicable 
only  for  scale  factors  which  are  powers  of  two,  as  well  as  for 
specific  moduli  for  the  PRNS  system.  The  following  algorithm 
describes  the  scaling  policy. 


Algorithm  5.2: 

Consider  the  computation  of  the  sum  of  products 


A 

E A1  • B1 
i=0 


mod  (2n  + 1),  where  A* , 


B1eZ2n+1, 


i = 0,  1, 


A. 


Such  a sum  of  products  can  be  efficiently  computed  (according  to 

Algorithm  5.1),  using  an  Nth-order  SM/PRNS,  which  makes  use  of  an 

n^-bit  modulus  pj_,  where  n^  = 2 • n/N  + log2  N if  n = k • N,  k * 1, 

integer  and  nj  = 1 + log2  N if  n = N. 

Our  scaling  algorithm  considers  only  the  case  where  p^  = 2nl+l. 

Suppose  that  we  want  to  scale  down  the  above  sum  of  products  C = 

A ^ 

E A1  • B1  by  a scale  factor  K = 2Zm,  or  in  other  words,  compute  C = 
i=0 

[ C/K ] , where  [•]  denotes  rounding  to  the  closest  integer.  The 
following  steps  should  be  taken. 
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Step  1:  Decompose  A1  and  B1  in  byte  form,  (N  bytes),  as  described 

by  equation  (5.33)  and  get 


A1  = (Aq1,  Aji,  ...,  A^i)  (5.42) 

B1  = (B0\  B^,  ...,  B,,.!1) 
i = 0,  1 , . . . , A 


Step  2:  Using  an  Nth-order  PRNS  mapping  (equation  (4.7))  and  a 

PRNS  modulus 


p1  = 2nl  + 1 (5.43) 

where  n^  > 2 • n/N  + log2  N in  case  that  n = k • N,  k t 1 and 
integer  and  n^  > 1 + log2  n in  case  n = N,  with  n^  such  that  the 
Nth-order  PRNS  mapping  f jq  exists,  perform  the  following  PRNS 
mappings : 


(Aq1’  Al1’  •••>  A^!1)  — > (Aq1*,  A-^1*,  ...,  AN_1i*)  (5.44) 


i*> 


CN 


(Bq1,  B^,  ...,  BN_11 ) _>  (Bq1*,  B^*,  ...,  Bn_11*) 


i = 0,  1,  . . . , A 


Step  3:  Compute  the  following: 

Aj1*  = (2-m  • Aj1*)  modp^ 
Bj^*  = (2~m  • Bj  ■*•*)  modp^ 


(5.45) 
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where  i = 0,  1,  . ..,  A and  j = 0,  1,  . ..,  N - 1.  Notice  that  2~m  is 
the  multiplicative  inverse  of  2m  modp^,  so  it  is  an  integer  number 
belonging  in  Zp^. 


Step  4:  Compute  Cg*,  C]_*,  • ••>  C^-p*,  where 


X ~ 

y a . i*  . r . i* 

u=o 


modp^ 


(5.46) 


j = 0,  1,  . . . , N - 1 

Step  5:  Using  the  Nth-order  inverse  PRNS  mapping  f jvj- (equations 

(4.8),  (4.9)),  perform  the  mapping 


(C0* , Ci*,  ...,  CN_!*) 

'Xj 

Step  6:  Compare  every  Cj , 

2nl_2m  then  leave  Cj  as  it 


fN— 1 ~ “ 

X (Cq,  Clt  Cj^]_2) 

j = 0,  ...,  N - 1 with  2nl~^m. 
is , or 


(5.47) 
If  Cj  < 


Cj  < (2nl  2m)  modpi  then  Cj  <-  Cj  (5.48) 

a. 

while  if  Cj  > 2nl_2m  then  find  the  smallest  integer  qj  such  that  (Cj 
+ qj  • 2nl-2m)  modp^  < , or 


Cj  > (2nl  2m)  modpi  then  Cj  <-  (Cj  + qj  • 2nl  2m)  modpi  (5.49) 


Step  7:  The  scaled  sum  of  products  is  finally  given  by 
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C = 


' C ■ 

_22m 

il 

C0  • 2°  + (q0 


>2m 


2°)  + Ci  • 2n/N  + (5.50) 


1 1 
(q^  • — _ ' 2n2^)  + C2  * 22n2^  + (q2  * ^ • 22n/,N)  + . . . + 


22m 


22m 


CN-1  • 2(N-Dn/N  + ((,N_1  . . 2(N-l)n/N} 


2m 


mod  (2n  + 1) 


The  following  two  examples  will  demonstrate  the  validity  of  the 
above  scaling  algorithm. 


Example  5.3:  Perform  C = (A  + B)  mod  (2n  + 1),  where  A = 200,  B = 

10,  n = 12  and  perform  scaling  by  2.  Use  an  4th-order  SM/PRNS. 

In  this  case  n = 12,  N = 4,  and  n/N  = 3.  Therefore,  the  byte 
forms  of  numbers  A and  B are: 

A = 200  = (A0,A1,A2,A3)  = (0, 1,3,0)  = 0-2°  + 1-23  + 3-26  + 0-29 

B = 10  = (B0,B1,B2,B3)  = (2, 1,0,0)  = 2-2°  + 1-23  + 0-26  + 0-29 

The  4th-order  PRNS  should  have  a modulus  p^  = 2nl  + 1 with  n3  > 

2 • n/N  + log2  N = 2 • 3 + 2 = 8.  The  choice  p^  = 2^  + 1 = 257  is  a 

good  choice,  because  257  is  a prime  number  and  also  p3  = 257  = 8 • 
32+1  and  according  to  equation  (5.5)  the  4th-order  PRNS  mapping 
exists. 

The  4th-order  mapping  f^  maps  (Aq,  Alf  A2,  A3)  and  (Bq,  B1?  B2, 
B3)  onto  (Aq*,  Ax*,  A3*,  A3*)  and  (Bq*,  B^*,  B2*,  B3*)  respectively, 
where  Aj  , B^  , i = 0,  ...,  3 are  given  by  equation  (5.9).  In  this 
r0  = 4,  (because  rQ^  = 4 4 s 256  = -1  mod  257),  and  Aj*,  Bi*  are 


case 
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found  to  be:  Aq*  = 52,  Ax*  = 44,  A2*  = 145,  A3*  = 16,  B0*  = 6,  Bj* 

= 255,  B2*  = 195,  B3*  = 66. 

According  to  equation  (5.45),  all  the  entries  Ai*  and  B^*  have 
to  be  multiplied  by  2-1  mod  257  a 129  to  get  a (129  • A±*)  mod 
257  and  = (129  • Bi*)  mod  257,  or  Aq*  a 26,  Ai*  a 22,  A2*  a 201, 

% 'Xj  % <\j  >\j 

A3*  = 8,  B0*  a 3,  BX*  a 256,  B2*  a 226,  B3*  a 33. 

a.  'Xj  'Xj 

Performing  the  addition  in  the  PRNS  we  get  C3*  a (Aj*  + Bi*)  mod 
257  or  C0*  a 29,  CX*  a 21,  C2*  a 170,  C3*  a 41. 

i + 'Xj  % 

The  inverse  mapping  f^~l  maps  (Cq  , C]_  , C2  , C3  ) onto  (C0,  C]_, 

'Xj  'Xj  % 

c2>  c3)  where  Ci , i = 0,  ...,  3 are  given  by  equation  (5.11)  and 

'Xj  % 'Xj  % 

they  are:  C0  a 1,  a 1,  C2  a 130,  C3  a 0. 

According  to  Step  6 of  Algorithm  5.2,  in  order  to  find  the 

'Xj  'Xj 

correct  values  of  Ci  to  be  used  by  equation  (5.50),  each  Cj  should 
be  compared  to  2nl-m  a 23~^  a 2^  a 128.  In  our  particular  example, 

'Xj  'Xj  'Xj 

Cq,  Ci,  and  C3  do  not  need  to  be  changed  because  they  are  less  than 

'Xj 

128.  For  C2  a 130  > 128,  the  smallest  integer  q2  that  satisfies  (C2 
+ q2  • 128)  mod  257  < 128  is  q2  = 1.  In  this  case,  the  new  value  of 
C2  is  C2  a (130  + 1 • 128)  mod  257  a (258)  mod  257  a 1.  So  the 
correct  values  of  Ci  and  qi,  i = 0,  ...,  3 are  Cg  a 1,  C]  a 1,  C2  a 

'Xj 

1,  c3  a o,  q0  = 0,  qi  = 0,  q2  = 1,  q3  = 0. 

'Xj 

The  final  scaled  result  is  given  by  equation  (5.50)  as:  C a [1 
• 2°  + 1 • 23  + 1 • 26  + (1  • 1/2  • 26)]  mod  4097  a (1  + 8 + 64  + 
32)  mod  2097  a 105,  which  is  the  correct  result  since  [(A+B)/2]  = 
[210/2  J = [105]  = 105. 

Example  5.4:  Now  repeat  the  same  example  for  the  case  of 


multiplication  and  perform  C = (A  • B)  mod  (2n  + 1)  with  A = 200,  B 
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= 10,  n = 12.  Scale  by  a scale  factor  of  2 on  each  one  of  A and  B, 
so  that  the  entire  product  should  be  scaled  down  by  4.  The  expected 
result  is  C = [(A-B)/4[  = [2000/4]  = [500]  = 500. 

The  byte  decomposition  of  A and  B gives  A = (Aq,  A^,  A2,  A3)  = 
(0,  1,  3,  0)  and  B = (Bq,  B3,  B2,  B3)  = (2,  1,  0,  0)  while  the 
application  of  the  PRNS  gives  the  same  as  before  values  of  A^*,  B3*, 
or  (A0* , Ax*,  A2*,  A3*)  = (52,  44,  145,  16)  and  (B0*,  Bx* , B2*,  B3*) 
= (6,  255,  195,  66).  The  multiplication  of  each  A^*  and  B^*  by  2“ 1 

'Xj  . 'x,  . 

mod  257  s 129  gives,  according  to  the  previous  example,  (Aq  , A^  , 
aV,  A3*)  = (26,  22,  201,  8)  and  (B0*,  Bx*,  B2*,  B3*)  = (3,  256, 


226,  33), 


The  component-wise  multiplication  in  the  PRNS  will  return  the 

'w  . 'V  . 'v  . % % % % 

ci  = (Ai •Bi*)  mod  257  or  C0*  = 78,  C^  = 235,  C2  = 194,  C3*  h 7. 


% 


'Xj  'Xj 


% 


The  inverse  mapping  f^  1 results  in  (Cq>  C^,  C2,  C3)  = (0,  129, 
66,  65). 


Since  the  final  goal  is  to  scale  the  overall  product  by  a scale 
factor  of  2^,  each  one  of  C3  should  be  compared  to  2nl“^m  s 2^-2  = 
2D  5 64.  The  value  of  Cq  = 0 should  remain  the  same  because  0 < 64. 

'Xj  'Xj  'Xj 

The  other  three  components  C3,  C2,  C3  are  all  greater  than  64  and 


they  need  to  be  changed.  The  smallest  integers  qj  that  satisfy  (Cj 
+ qj  • 64)  mod  257  < 64,  j = 1,  2,  3,  are  found  to  be  q^  = 2,  q2  = 

'V/  'X,  % 

3,  q3  = 3.  The  new  values  of  Cj,  j = 1,  2,  3 are  given  by  Cj  <-  (Cj 

'V.  'Xj 

+ qj  • 64)  mod  257  or  C3  *•  (129  + 2 • 64)  mod  257,  63  <-  (66  + 3 • 

’Xj  f\j 

64)  mod  257,  C3  <-  (65  + 3 • 64)  mod  257  which  finally  gives  C3  s 0, 

'Xj  'Xj 

C2  Hi,  C3  = 0 • 
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Using  equation  (5.50),  we  obtain  the  final  product  scaled  by  22 


C s 


1 1 

Cq  • 2°  + (q0  • — * 2°)  + Ci  • 23  + ( q^  • • 23) 

22  22 


1 1 

C2  • 26  + (q2  • _ • 26)  + C3  • 29  + (q3  • _ • 29)  mod  4097  h 
22  22 


1 1 

0 + 0 + 0 + (2  • _ • 23)  + 1 • 26  + (3  • _ • 26)  + 0 + 
22  22 


(3  • • 2«) 

22 


mod  4097  h 500 


This  is  the  expected  result. 

The  previously  described  scaling  procedure  returns  the  scaled- 
down  result  rounded  to  the  closest  integer.  In  the  event  that  the 
scaled-down  result  has  fractional  part  equal  to  0.5,  it  is  rounded 
to  the  next  larger  integer  (i.e.  a. 5 is  rounded  to  a + 1). 

The  scaling  algorithm  has  been  simulated  and  error  variances 
have  been  computed.  Table  5.13  shows  values  of  the  variances  of 
error  due  to  scaling  by  a scale  factor  of  2m  for  the  case  of  n = 12 
and  N = 4.  The  modulus  used  by  the  4th-order  SM/PRNS  is  p^  = 23  + 1 
= 257. 

Figure  5.12  shows  in  (block  diagram  form)  a unit  that  can 

compute  a sum  of  X products  ( E • B^)  mod  (2n  + 1)  using  an  Nth 

i=0 

order  SM/PRNS.  This  unit  provides,  at  the  same  time,  scaling  of  the 
overall  result  by  a scale  factor  22m. 


152 


Figure  5.12  Scaling  in  the  Real  Arithmetic  SM/PRNS 


153 


Table  5.13  Error  Variance  due  to  scaling  by  2m  for 
n = 12,  N = 4 


m 

Error  Variance 

1 

0.312500 

2 

0.281250 

3 

0.273438 

4 

0.271484 

5 

0.270621 

6 

0.272093 

7 

0.271195 

8 

0.270852 

The  main  advantage  of  this  scaling  technique  in  the  Real 
Arithmetic  SM/PRNS,  is  that  the  scaling  can  be  mainly  performed 
inside  the  parallel  PRNS,  without  using  the  Polynomial  CRT  (inverse 
PRNS  mapping  f^-^)  every  time  we  want  to  scale  down  a partial 
result.  It  suffices  to  multiply  each  PRNS  entry  by  2~m  modp^  every 
time  that  scaling  is  employed  for  overflow  prevention  (a  fairly 


simple  operation  since 

2~m  modpi  is  a constant) 

and  use 

the 

Polynomial  CRT  (f^--*-) 

only  once,  at  the  very 

end . 

The 

implementation  of  the 

inverse  mapping  would 

require, 

in 

general,  a considerably 

big  amount  of  hardware  and  at 

the  same 

time 

it  would  be  a fairly  slow  process  for  big  values  of  N.  Therefore, 
high  performance  can  be  achieved  by  this  scaling  algorith  in  reduced 
hardware  environment,  which  avoids  multiple  use  of  the  Polynomial 
CRT.  Of  course,  the  last  two  Steps  6 and  7 of  Algorithm  5.2  are  not 
really  trivial,  since  some  comparisons  and  a big  mod  (2n  + 1)  adder 
are  required.  Nevertheless,  since  these  steps  are  used  only  once, 
that  is  at  the  very  end,  we  can  still  design  faster  and  cheaper 
systems  compared  to  the  ones  achieved  by  making  multiple  use  of  the 
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Polynomial  CRT.  Two  more  things  should  be  stated  at  this  point. 
First,  in  Step  6 of  Algorithm  5.2,  the  2m  most-significant  digits  of 
the  originally  obtained  Cj  can  determine  the  new  values  of  Cj  as 
well  as  the  values  of  r j . Second,  the  maximum  scale  factor  that  can 
be  tolerated  by  such  a scaling  algorithm  is  2nl,  where  n^  is  such 
that  pi  = 2nl  + 1,  pi  being  the  PRNS  modulus. 

5.2.3  Polynomial  Decomposition  of  the  Product  (A  • B)  mod  (2n  - 1) 

Another  interesting  case  where  the  SM/PRNS  can  be  used  for  real 
arithmetic,  is  the  decomposition  of  the  product  (A  • B)  mod  (2n  - 1) 
in  N polynomial  channels  using  a smaller  modulus.  The  advantages 
obtained  are  those  discussed  in  Section  5.2.1  and  the  theoretical 
analysis  is  identical  with  the  case  of  the  product  (A  • B)  mod  (2n  + 
1).  The  only  difference  is  that  the  polynomial  products  are 
considered  here  modulo  (x^  - 1)  instead  of  modulo  (xN  + 1).  This 
happens  because  2n  mod  (2n  - 1)  s 1.  It  is  obvious  that  the 
mathematical  tool  to  be  used  here  is  the  Nth-order  SM/PRNS  which  is 
based  on  the  factorization  of  x^  - 1,  instead  of  x^  + 1 that  was 
used  in  Section  5.2.1.  The  theory  of  such  a PRNS  system  is 
extensively  presented  in  Section  4.4  and  is  not  repeated  here. 

The  following  algorithm  describes  the  decomposition  of  the 
product  (A  • B)  mod  (2n  - 1)  in  N polynomial  channels,  using  the 
PRNS  developed  in  Section  4.4. 

Algorithm  5.3: 

Consider  two  n-bit  integers  A and  B,  where  A and  B are  given  by 


equation  (5.32). 
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Case  (i):  n = k • N where  k t 1 and  integer 

Steps  1,2,3:  Identical  to  Steps  1,2,3  of  Algorithm  5.1,  Case  (i). 

Step  4:  Perform  (A(x)  • B(x)  mod  (xN  - 1))  arithmetic  modp^  and 

suppose  that 

A(x)  • B(x)  mod  (x^  - 1)  = Cq  + C^x  + ...  + Cfj_jx^--*-  (5.51) 

where  C^  £ Zp^,  i=0,  1,  . ..,  N - 1. 

Then,  the  final  product  (A  • B)  mod  (2n  - 1)  is  given  by 

(A  • B)  mod  (2n  - 1)  = (C0  • 2°  + Cx  • 2n/N  + C2  • 22n/N  (5.52) 

+ ...  + CN_!  • 2<N-1)'n/N)  mod  ( 2n  - 1) 

Case  (ii):  n = N 

Steps  1,2:  Identical  to  Steps  1,2  of  Algorithm  5.1,  Case  (ii). 

Step  3:  Perform  (A(x)  • B(x)  mod  (x11  - 1))  arithmetic  modp2  and 

suppose  that 

A(x)  • B(x)  mod  (xn  - 1)  = cq  + c^x  + ...  + cn_^xn_^  (5.53) 

where  Cj  £ Zp2,  i = 0,  1,  ...,  n - 1. 

Then  the  product  (A  • B)  mod  (2n  - 1)  is 

(A  • B)  mod  (2n  - 1)  = (c0  • 2°  + Cl  • 21  + c2  • 22  (5.54) 

+ ...  + C n_x  • 2n_1)  mod  (2n  - 1) 
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5.2.4  Prescaling 

A final  interesting  subject  is  left  to  be  presented  now;  the 
subject  of  prescaling. 

As  already  discussed  in  the  previous  sections  of  the  present 
chapter,  the  product  (A  • B)  mod  (2n  + 1)  can  be  efficiently 
computed  in  N polynomial  channels  using  the  PRNS  and  an  n^-bit 
modulus  where  n^  was  found  to  be 

n 

n^  = 2 • _ + log2  N if  n = k • N,  k / 1 and  integer  (5.55) 

N 

nj_  = 1 + log2  n if  n = N 

This  way  the  minimal  width  of  each  PRNS  channel  should  be  n^  bits 
where  n^  < N. 

In  the  event  that  the  modulus  choice  of  our  PRNS  system  is 
desired  to  be  p^  = 2n*  + 1,  (elegant  conventional  designs,  effective 
scaling  algorithm  - Section  5.2.2),  n*  must  be  such  that  p^  = 2n*  + 
1 should  allow  the  existence  of  the  appropriate  Nth-order  PRNS.  The 
choice  n*  = n^  (where  n^  is  given  by  equation  (5.55)),  might  not 
necessarily  satisfy  the  above  requirement.  Since  the  minimal  PRNS 

JL. 

channel-width  should  be  n^  bits,  n > n^  must  be  chosen.  The  case 
might  be  that  such  a choice  of  n > n^  that  allows  the  existence  of 
the  PRNS  mapping  could  be  so  large  that  the  PRNS  would  not  offer  any 
advantageous  solution  to  our  problem.  In  such  cases,  it  might  be 

JL 

useful  to  search  for  values  of  n < n^  that  give  modulus  choices  of 
the  form  p^  = 2n*  + 1 for  which  the  PRNS  exists.  For  this  to  be 
successfully  done,  prescaling  of  the  bytes  Aq,  ...,  Ajq_i,  Bq,  ..., 
Bn_i  of  the  two  numbers  A and  B should  take  place,  so  that  the 
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reduced  PRNS  channel-width  would  still  give  full-precision  correct 
results . 

For  a better  understanding  of  the  above,  consider  the  product  (A 
• B)  mod  (2^  + 1),  where  both  A and  B are  24-bit  integer  numbers. 
Suppose  we  want  to  compute  the  above  product  using  an  Nth-order  PRNS 
with  N = 8.  Here,  according  to  Algorithm  5.1,  each  byte  Aq,  ..., 
Ay,  Bq,  ...,  By  would  be  a 3-bit  number.  The  minimal  PRNS  channel 
width  should  be  n3  = 2 • n/N  + log2  N,  where  n = 24,  N = 8.  So  n3  = 
2 • 24/8  + log2  8=2*3+3=9  bits.  A modulus  p^  = 2^  + 1 = 513 
does  not  allow  the  existence  of  the  8th-order  PRNS,  while  pj  = 2®  + 
1 = 257  allows  it,  since  257  is  a prime  number  of  the  form  p = 16  • 
k + 1 = 16  • 16  + 1 (Corollary  4.1).  If  the  bytes  Aq,  ...,  Ay,  Bq, 
...,  By  were  a little  bit  smaller  than  3-bits  each,  then  the  8-bit 
PRNS  system  based  on  the  modulus  choice  pj  = 2^  + 1 = 257  would  do  a 
satisfactory  job.  Prescaling  is  a meaningful  solution  to  the  above 
problem  and  is  presented  here  by  an  example. 

Suppose  we  want  to  perform  (A  • B)  mod  (2^  + l)>  where  A,  B are 
12-bit  integers,  using  a 4th-order  SM/PRNS.  In  such  a case,  (due  to 
Algorithm  5.1)  each  byte  Aq,  ...,  A3,  Bq,  ...,  B3  of  the  two  numbers 
A,  B would  be  a 3-bit  number,  while  the  minimal  PRNS  channel-width 
should  be  n^  = 2 • n/N  + log2  N = 2 • 12/4  + log3  4 = 2*3  + 2 = 8 
bits.  The  modulus  p^  = 2®  + 1 = 257  allows  the  existence  of  the 
4th-order  PRNS  mapping  (because  257  is  prime  and  257  = 8 • 32  + 1 
(equation  (5.5))).  Suppose  that  for  the  purpose  of  hardware  savings 
we  want  to  compute  the  above  product  using  a 4th-order  PRNS  and  a 
PRNS  channel-width  of  4 bits  instead  of  8 bits.  For  such  a PRNS 
system  the  modulus  p^  = 2^  + 1 = 17  would  be  appropriate,  because  17 
is  a prime  number  and  also  17  = 8 • 2 + 1 (equation  (5.5)). 
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To  ficilitate  the  above  processing  in  a 4-bit  PRNS  system,  the 

following  prescaling  of  Aq A3,  Bq,  B3  should  take  place. 

Suppose  that  the  two  12-bit  numbers  A and  B are  expressed  in  byte 
form  as 


A = A0  • 2°  + Ax  • 23  + A2  • 26  + A3  • 29  (5.56) 

B = B0  • 2°  + B],  • 23  + B2  • 26  + B3  • 29 

Then,  by  using  Algorithm  5.1,  the  product  (A  • B)  mod  (232  + 1)  is 

given  by 

(A  • B)  mod  (212  + 1)  = (C0  • 2°  + Cx  • 23  + C2  • 26  + (5.67) 

C3  • 29)  mod  (212  + 1) 
where  Cq,  C3,  C2,  C3  are  such  that 

Cq  + C^x  + C2x2  + C3X3  = (Ag  + A^x  + A2x2  + A3X3)  • (5.68) 

(Bq  + B^x  + B2x2  + B3X3)  mod  (x^  + 1) 

Then,  Cj,  i = 0,  ...,  3 are  given  by 

Cq  = Aq  ' Bq  - Ai  • B3  - A2  • B2  - A3  • B^  (5.69) 

C1  = A0  • Bp  + Ai  • B0  - A2  • B3  - A3  • B2 

C2  = Aq  • B2  + A2  • Bq  + A^  • B-^  - A3  • B3 

C3  = Aq  • B3  + A^  • B2  + A2  • B^  + A3  • Bq 
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Obviously,  as  expected  from  Algorithm  5.1,  each  C^,  i = 0, 

3 requires  8 bits  to  be  securely  represented,  so  that  the  modulus 

choice  pi  = 2^  + 1 = 257  for  each  PRNS  channel  would  give  the 

correct  result. 

Suppose  that  we  want  to  use  a modulus  P2  = 2^  + 1 = 17. 

Consider  the  products  Aj  • Bj  of  equation  (5.69).  Each  A^  and  B j , 

i,  j =0,  ...,  3,  is  a 3-bit  number  and  has  a value  between  0 and  7. 

Assume  that  we  prescale  each  one  of  A^  and  Bj  by  a scale  factor 
of  2^  = 4,  or  in  other  words,  consider  the  division  of  Aj  and  Bj  by 

4 as  follows: 

Ai  = 4 Aj  + eAi  (5.70) 

Bj  = 4 Bj  + eBj 

'b  % % <V 

where  Ait  Bj  e {0,  1}  and  eAi,  eBj  t {0,  1,  2,  3}.  Then,  A^  and  Bj 
are  1-bit  numbers  while  eA^  and  eBj  are  2-bit  numbers. 

Then  each  product  A^  • Bj  can  be  written  as 

Ai  • Bj  = (4  • Ai  + eA.)  • (4  • Bj  + eB . ) = (5.71) 

j i J j 

16  Ai  Bj  + 4 Bj  eA_.  + 4 Aj  eB.  + eA,  eB_. 

Dividing  eAi  by  2,  we  get 

eA . = 2 eA.  + eeA  (5.72) 

X 1 A i 

'b 

where  eAi,  eeAi  s {0,  1},  and  they  are  both  1-bit  numbers. 
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Finally,  substituting  in  the  last  term  of  equation  (5.71)  for 
eAi  obtained  from  equation  (5.72),  we  get  for  the  product  Aj  Bj 

A-:  B-:  = 16  Aj  Bi  + 4 ®ieA  + ^ AieB  + 2 e&  eg  + ee  eB  (5-73) 
J J Ji  j i j Aj  j 

Now  the  coefficients  Cq,  C]^,  C2,  C3  of  the  overall  product  (A  • 
B)  mod  (2^2  +.  l)  (equation  (5.69))  are  given  by 

Co  = 16  c00  + 4 C01  + 4 c02  + 2 c03  + c04  (5.74) 

= 16  C^q  + 4 C33  + 4 C^2  + 2 C33  + C34 

C2  = 16  C2o  + 4 C2i  + 4 C22  + 2 C23  + C24 

c3  = 16  c30  + 4 C3i  + 4 C32  + 2 C33  + C34 

where 

c00  = A0  B0  " A1  b3  - a2  b2  “ a3  b1  (5.75) 

C01  = eAQ  B0  " eA1  b3  " eA2  b2  “ eA3  B1 

c02  = A0  eBQ  " A1  eB3  " a2  eB2  - a3  eB3 

c03  = eAQ  eBQ  " eA1  eB3  - eA2  eB2  ~ eA3  eB1 


C°4  ' eB0  ‘ ''A,  eB3  ’ 6eA2  6B2 


eB 


1 


Similar  expressions  with  the  above  (with  different  indexes  being 
the  only  difference)  can  be  found  for  the  coefficients  (C^q,  •••> 

c14)>  (c20’  •••> 


C24) 1 (c30»  •••>  C34)  of  equation  (5.74). 
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It  can  be  observed  that  each  one  of  the  C^,  k = 0,  3,  m = 

'Xj  n, 

0,  4 is  at  most  a 4-bit  number  (due  to  the  fact  that  Aj,  Bj  are 

'Xi 

1-bit  numbers,  eA^,  eBj  are  2-bit  numbers  and  eA^,  ee^  are  1-bit 
numbers).  Besides  this,  the  coefficients  (Cqo>  ^10»  C 20>  C30)  can 
be  derived  from  the  following  polynomial  product: 

Cqo  + C3qx  + C20x^  + C3qx^  = ((Aq  + A3x  + A3x^  + A3x^)  • (5.76) 

(Bq  + B^x  + B2X^  + B3x3)  mod  (x4  + 1))  arithmetic  modp3 

where  p3  is  a 4-bit  modulus.  Such  a p3  can  be  p3  = 24  + 1 = 17. 
Here  a PRNS  channel-width  of  four  bits  is  adequate  since  all  are 
at  most  4-bit  numbers. 

Similarly,  the  remaining  coefficients  can  be  obtained  from  four 
more  third-order  polynomial  products  mod  (x4  + 1)  as  follows: 

~ ~ ~ ~ (5.77) 

(Coi > CH , C21 > C31 ) = ( eA^ , eA^ , eA^ , eA^ ) • (Bo,B1,B2,B3)mod(x4+l) 

(c02’c12>c22»c32)  = (Aq,A1,A2,A3)  • (eB^eB  ^eg^eg  ^mod(x4+l) 

(C03,ci3,c23,c33)  = (eA(),eAi,eA2,eA3)  • (ego,eBi,eB2,eB3)mod(x4+l) 

(C04,C14,C24,C34)  = (ee  ,ee  ,ee  ,ee  )*(eB  ,eB  ,eB  ,eB  )mod(x4+l) 

Aq  A]_  A2  A3  U 1 Z j 

where  all  the  above  arithmetic  can  be  considered  mod  17. 

Obviously,  the  four  coefficients  Cq,  C3,  C2,  C3  of  equations 
(5.68)  and  (5.69),  instead  of  being  computed  with  one  polynomial 
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product  mod  (x^  + 1)  and  four  8-bit  polynomial  channels,  can  now  be 
computed  with  five  polynomial  products  mod  (x^  + 1)  (the  polynomial 
products  shown  by  equations  (5.76)  and  (5.77))  and  twenty  4-bit 

polynomial  channels. 

Without  prescaling,  the  total  memory  requirement  for  four  8-bit 
PRNS  channels  would  be  4 x 2 ^ x 8 = 2^1  bits  of  memory,  while  with 
prescaling  the  total  requirement  for  twenty  4-bit  PRNS  channels 
would  be  20  x 2®  x 4 = 20  x 2^  bits;  considerably  less  than  before. 
At  the  same  time,  the  hardware  requirements  for  the  forward  and 
inverse  4th-order  PRNS  mappings  ( f ^ would  be  less  if 

prescaling  is  applied,  due  to  the  smaller  channel-wordlength. 

If  prescaling  is  not  applied,  two  PRNS  mappers  f^  would  be 

required  to  map  the  tuples  (Aq,  A^,  A2,  A3)  and  (Bq,  B|,  B2,  B3), 

each  one  requiring  eight  tables  (Figure  5.5),  each  table  having  8- 
bit  data  buses;  a total  memory  requirement  of  16  x 2^  x g _ 2 23 

bits.  If  prescaling  is  considered,  then  six  mapping  units  would  be 

% % 'x,  % % 'x  % 

required  to  map  the  tuples  (Aq,  Aj,  A2,  A3),  (Bq,  B3,  B2,  B3),  (eA  , 

eA  > eA  * eA  )»  (eB  » eB  > eB  > eB  ) > (®A  » ®A  » ®A  > ®A  ) * (ee,  > 
1 2 3 0 1 2 3 0 1 2 3 Ao 

ee  , ee  , ee  ) and  4-bit  data  buses.  In  this  case,  the  overall 

A1  A2  A3 

requirement  is  48  x 2^  x 4 = 48  x 2^  bits  of  memory,  considerably 
less  than  before. 

The  inverse  mapping  requires  twelve  tables  (Figure  5.6). 

The  case  of  no  prescaling  would  call  for  one  f4_i  unit  (to  map  (Cq  , 
Ci*,  C2*,  C3*))  and  8-bit  data  buses,  or  totally  12  x 2^  x 8 bits 
of  memory,  while  the  case  of  prescaling  would  call  for  five  £4”^ 

units  (to  map  the  tuples  (C00*,  C10*,  C2q*,  C30*),  (C01*,  Cn*, 

c21*>  c31*)>  •••>  (Cq4* , C14*,  C24*,  C34*))  and  4-bit  data  buses,  a 
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total  of  60  x 28  x 4 memory  bits;  again  less  than  in  the  case  of  no 
prescaling. 

As  a last  observation  it  has  to  be  mentioned  that  the  task  of 

Oi  O.  r\j 

getting  A^,  B^,  e^,  egj,  e^,  and  ee  from  A^  and  is  really 

i % 

trivial.  Recalling  that  each  Aj , B}  is  a 3-bit  number,  Aj  is  the 

most  significant  digit  of  Aj , B^  the  most  significant  digit  of  B^, 

and  eg^  are  the  2-bit  numbers  consisting  of  the  two  least 

significant  digits  of  A^  and  B^  respectively  (equation  (5.70)),  e^ 

is  the  middle  significant  digit  of  A^  , while  ep  is  the  least 

A. 

significant  digit  of  A^  (equation  (5.72)). 


CHAPTER  6 

APPLICATIONS  OF  THE  PRNS  IN  ARRAY  PROCESSING 

In  the  previous  chapter  (Chapter  5)  several  applications  of  the 
Nth-order  SM/PRNS  in  complex  and  real  arithmetic  were  presented. 
These  applications  were  not  really  Nth-order  ones. 

The  problem  of  a multiplication  of  two  complex  numbers  is 
actually  a problem  of  second  order,  since  a complex  number  is  a 2- 
dimensional  entry  (it  is  consisting  of  a real  and  an  imaginary 
channel).  The  procedure  followed  in  Chapter  5 was  to  break  by  force 
the  complex  number  (x,  y)  in  N channels  (uq,  uj,  ...,  uN_^)  and  then 
apply  the  Nth-order  SM/PRNS  to  process  such  a complex 
multiplication.  Actually,  the  specific  case  presented  in  Chapter  5 
was  considering  the  case  of  N = 4,  where  a complex  number  (x,  y)  was 
decomposed  into  four  channels  (uq,  uj_,  U2,  U3)  and  then  processed 
with  a PRNS  of  order  four. 

In  the  case  of  real  arithmetic,  each  integer  A (which  is 
actually  a 1-dimensional  entry)  was  decomposed  in  N channels  (Aq, 
Aj,  ...,  An_i)  and  then  the  product  (A  • B)  mod  (2n  + 1)  was 
computed  with  an  Nth-order  PRNS  (Algorithms  5.1,  5.3). 

In  reality,  complex  and  real  arithmetic  are  pseudo  N-dimensional 
problems . They  are  processed  with  an  Nth-order  PRNS,  because  some 
encoding  has  been  employed  in  order  to  decompose  the  2-dimensional 
or  1-dimensional  entry  into  an  N-dimensional  one.  Since  some  effort 
is  being  placed  in  transformations  of  numbers  between  1 or  2 
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dimensions  on  one  hand  and  N dimensions  on  the  other  hand,  the  Nth- 
order  PRNS  can  not  really  offer  its  best  advantages. 

It  would  be  much  more  advantageous  and  meaningful  to  be  able  to 
apply  the  Nth  order  PRNS  in  some  truly  N-dimensional  applications. 

6 . 1 The  Product-of-Sums  Algorithm 


One  N-dimensional  processing  example  would  be  a typical  product 
of  sums  statement 

C = (Aq  + Aj  + ...  + A^_^)  • (Bq  + + ...  + (6.1) 

Such  computations,  can  be  frequently  found  in  matrix-matrix 
multiplications. 

Consider,  as  an  example,  the  nested  product  of  four  matrices  A • 
B • C • D,  where 


A = 

’ all  a12  • * * alN 

a21  a22  •••  a2N 
• • • 

B = 

' bn  b12  ...  b1N 
b21  b22  •"  b2N 

• • • 

• • • 

- aNl  aN2  • • • aNN  - 

- bNl  bN2  • • • bNN  - 

' C11  c12  • • • C1N 

' dn  d12  ...  d1N 

C = 

C21  c22  • • • C2N 

• • • 

D = 

d21  d22  • • • d2N 

• • • 

• • • 

• 

- CN1  CN2  • • • CNN  ■ 

- dNl  dN2  • • • dNN  - 

The  product  A • B is 
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ell  e12  • • • elN 
e21  e22  • • • e2N 

A • B = 

- eNl  eN2  • • • eNN  • 
where  for  i,  j = 1,  . ..,  N 


(6.2) 


eij  = ail  blj  + ai2  b2j  + + aiN  bNj 

or 

eij  = A0  + A1  + •••  + An_! 
while  the  product  C • D is 

' fll  f 12  • • • f IN 
f 21  f 22  • • • f2N 

C • D = 

• • • 

• • • 

■ fNl  fN2  fNN  - 

with  fAj , i,  j = 1,  . . . , N given  by 


(6.3) 


(6.4) 


(6.5) 


fij  = cil  dlj  + ci2  d2j  + •••  + ciN  dNj  (6-6) 

or 

fij  = Bq  + B-^  + ...  + B^_^  (6.7) 

Obviously,  the  matrix-product  E = A • B • C • D is  computation- 
intensive in  operations  of  the  form  of  "products-of-sums , " like  the 
one  shown  by  equation  (6.1).  There  are  many  Digital  Signal 
Processing  applications,  including  Image  Processing,  Geophysical 
Signal  Processing,  etc.  [Opp78a],  where  Matrix-intensive  Operations 


are  frequent. 
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The  product-of-sums  equation  (6.1)  can  be  easily  written  as  a 
polynomial-product  statement  as  follows: 

C = (Aq  + + ...  + Af^_^)  • (Bq  + + •••  + = (6.8) 

Cq  + Ci  + ...  + Cf,j_i 

where  Cq,  Ci,  . ..,  Cfj_i  are  the  coefficients  of  the  polynomial 
product  shown  below: 

Cq  + Cix  + ...  + Cn_ixN~1  = (Aq  + Aix  + ...  + An_ixN--*-)  • (6.9) 

(Bq  + Bix  + ...  + Bji_ixN-1)  mod  (x*^  - 1) 

If  such  product-of-sums  intensive  operations  need  to  be 
performed  in  some  modular  ring  Zp,  then  the  above  polynomial-product 
equation  (6.9)  can  be  computed  by  using  Zp  by  using  N parallel 
channels  and  with  reduced  complexity,  employing  the  Nth-order  PRNS, 
which  is  based  on  the  factorization  of  x^  - 1 (Section  4.4). 

Unfortunately,  viewing  a product  of  sums  as  a polynomial  product 
(as  in  nested  matrix-matrix  multiplications),  proved  to  offer  no 
real  advantages  in  terms  of  hardware  savings  or  speed  increase,  when 
polynomially  decomposing  of  product  in  N channels.  The  reason  is 
very  simple;  in  order  to  compute  the  equation  (6.1)  in  Zp,  without 
the  use  of  the  polynomial  decomposition,  N - 1 2-operand  additions 
would  be  required  for  each  of  the  sums  Aq  + Ai  + ...  + A^_i  and  Bq  + 
Bi  + ...  + Bfi_i,  while  one  2-operand  multipication  needs  to  follow. 
If  table  lookup  implementations  are  considered,  then  a total  of  2N  - 
1 table  lookups  are  required  for  the  implementation  of  equation 
(6.1). 
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If  the  Nth-order  PRNS  is  to  be  used,  then  the  computation  of 
equation  (6.1)  can  be  viewed  as  the  polynomial  product  described  by 
equation  (6.9).  In  this  case,  N multiplications  performed  in  N 
parallel  channels  are  required  to  obtain  the  N-channel  result  Cq, 
C]_,  ...,  Cn_i>  but  at  the  very  end,  the  evaluation  of  the  expression 
C = Cq  + + ...  + needs  to  take  place.  This  requires  N - 1 
2-operand  additions  and  a total  of  2N  - 1 operations  would  be 
necessary  for  a product-of-sums  statement.  Obviously,  for  the  PRNS 
decomposition  to  offer  any  real  benefits  for  these  applications, 
techniques  must  be  developed  for  bypassing  the  problems  mentioned 
above. 

6 . 2 Communication  Correlator  Receivers 

Autocorrelation  and  cross-correlation  between  signals  are 
essential  techniques  in  almost  every  communication  receiver  [0pp78a, 
Pee76a,  Spi77a] . 

Communication  signals  are  effected  by  a lot  of  distortion 
parameters  including  atmospheric  conditions,  radiation,  folded  path, 
just  to  name  few.  In  most  cases,  cross-correlation  and 
autocorrelation  techniques  are  the  main  noise-removal  procedures. 

In  case  of  multichannel  transmissions,  (i.e.,  multiphase  or 
multifrequency-modulated  signals,  etc.),  the  operations  of  cross- 
correlation and  autocorrelation  can  become  extremely  complex  and 
time  consuming,  since  each  signal  consists  of  multiple  channels.  If 
speed  is  the  critical  design  parameter,  fast  correlators  performing 
the  receiving  operations  must  be  considered.  The  PRNS  can  be  shown 
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to  provide  meaningful  solutions  for  the  design  of  multichannel 
correlator  receivers. 

The  interesting  example  of  the  Split  Beam  Correlator  Array 
Processor  [0pp78a]  (used  in  sonars  and  other  applications),  is 
shown,  in  block  diagram  form,  in  Figure  6.1.  The  array  of 
transducer  elements  is  partitioned  into  two  sections,  each  of  which 
is  beamed  as  a linear  array.  The  outputs  of  these  two  sections  are 
then  cross-correlated.  The  concept  of  operation  is  that,  if  there 
is  a signal  source  in  the  direction  of  the  beam,  it  will  be  focused 
by  each  partition  and  produce  identical,  or  similar,  outputs  that 
will  display  a high  degree  of  correlation.  Conversely,  if  there  is 
no  energy  in  the  signal  field  in  the  direction  of  the  beams,  the 
array  outputs  will  be  dissimilar  and  will  not  be  strongly 
correlated.  In  general,  the  value  of  the  cross-correlation  between 
the  two  partitions  will  be  translated  in  a difference  |d^  - d2 1 
between  the  distances  of  the  moving  object  and  the  two  transducer 
arrays.  Thus,  identification  of  the  location  of  the  moving  object 
can  take  place. 

If  T is  the  period  of  the  received  signal  s^  = (s^(0),  s^CSt), 
si(2St),  ...,  s^(ASt),  ...),  then 

A T 

AT  = _ (6.10) 

N 

where  N is  the  number  of  elements  in  each  one  of  the  two  tranducer 
arrays,  and 

A 

APsj  = sj  (t  + pAt) 


(6.11) 
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Lower  N Channels 


Figure  6.1  Split  Beam  Correlator  Array  Processor 
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or  APs^  is  a delayed  by  -pAt  version  of  s^.  Since,  d^  t d2,  S2  is  a 
delayed  version  of  s^,  by  some  delay,  say  -mAt,  or 

s2  = s^  (t  + mAt)  (6.12) 

The  cross-correlation  between  the  outputs  of  the  two  arrays  would 
have  the  following  expression: 

Rs^S2(0)  = s^  • S2  + Asi  • As2  + A^s^  * A^S2  + ...  + (6.13) 

A^“^s^  • An~^S2 

Rs^S2(1)  = si  • As2  + Asi  • A^S2  + A^s^  • A^S2  + ...  + 

An_^si  • S2 

RsiS2(2)  = Si  • A^S2  + Asi  • A^S2  + A^si  • A^S2  + . • • + 

A^_^si  • As2 


RsiS2(N-1)  = si  • A^  ^S2  + Asi  • S2  + A^si  • As2  + ...  + 
aN-1si  . AN-2S1 

where  in  the  above  equation  (6.13),  each  si  • A^-sj  is  an  inner 
product  (vector  product)  and  not  a product  of  two  numbers. 

It  is  obvious  from  equation  (6.13)  that  such  a cross-correlation 
between  two  N-channel  signals  would  call  for  inner  products  in  a 
first  level  of  vector  operations  and  N (N  - 1)  sum  of  vectors  in  the 


next  log2N  levels  of  vector  sums;  an  extremely  complex  and  time- 
consuming  process. 
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As  a numerical  example  of  understanding  the  mechanism  of  the 
correlation  between  two  N-channel  signals,  consider  the  case  of  the 
Split  Beam  Correlator  Receiver  of  Figure  6.1  for  N = 4 and  suppose 
that  the  received  signal  s^  in  the  upper  group  of  channels  is  s3  = 
{s^O),  si(St),  s1(26t) , si(155t),  SiCO),  $i(St),  ...}  = {4,  5, 

6,  7,  8,  7,  6,  5,  4,  3,  2,  1,  0,  1,  2,  3,  4,  5,  ...}.  Such  a signal 
has  period  T = 16 St  while  AT  = 16St/4  = 4&t  (equation  6.10).  If  the 
received  signal  in  the  lower  group  of  channels  is  S2(t)  = s^(t+St) 
and  it  is  desired  to  obtain  the  cross-correlation  of  sj  and  S2>  then 
equations  (6.13)  will  give  for  the  case  of  N = 4 

Rs^S2(0)  = si  • S2  + Asj  • As2  + A2si  • A2S2  + A3si  * A3S2  (6.14) 

Rs^S2(1)  = si  • As2  + Asi  • A2s3  + A2si  • A3s3  + A3si  • S2 

RsiS2(2)  = si  • A2s3  + Asi  • A3s3  + A2s3  • S2  + A3s3  • As2 

RsiS2(3)  = si  • A3s3  + Asi  • S2  + A2si  • As2  + A3s3  • A2s3 

where  si  = (4,  5,  6,  7,  ...),  Asi  = (8,  7,  6,  5,  ...),  A2si  = (4,  3, 
2,  1,  ...),  A3si  = (0,  1,  2,  3,  ...),  s 2 = (5,  6,  7,  8,  ...),  As2  = 
(7,  6,  5,  4,  ...),  A2s2  = (3,  2,  1,  0,  ...),  A3s2  = (1,  2,  3,  4, 
...).  Denoting  the  inner  product  as 

A 

(ao» ai , a2 , a3)  • (b0,bi,b2,b3)  = ao-bo+ai-bi+a2-b2+a3-b3  (6.15) 

the  obtained  cross-correlation  between  si(t)  and  si(t  + 6t)  is 
having  terms  RsiS2(0)  = 336,  RsiS2(l)  = 224,  RsiS2(2)  = 176, 
Rsis3(3)  = 288.  The  maximum  term  is  the  zero  lag  of  the  cross- 
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correlation  Rs^C0)  = 336  and  the  meaning  is  that  s^  and  S2  are 
strongly  correlated  so  that  S2  is  decoded  as  S2(t)  = s^(t). 

If  S2(t)  = sjft  + 25t)  (S2  is  delayed  by  -25t  version  of  S!>  the 
terms  of  the  cross-correlation  are  found  to  be  Rs^S2(0)  = 316, 
Rs1s2(1)  = 196,  Rs1s2(2)  = 196,  Rs1s2(3)  = 316,  and  the  maximum  term 
is  not  uniquely  defined.  The  two  maximum  terms  are  Rs]_S2<0)  = 
Rs^S2(3)  = 316  and  S2  can  be  decoded  as  S2(t)  = s^(t)  or  S2(t)  = 
A3si(t)  = s^t  + 3 AT)  = s^t  - AT)  (delayed  version  of  s^(t)  by  a 
delay  AT).  The  two  cases  are  equally  likely  to  occur  and  the 
decoder  will  be  designed  to  decode  either  way  or  the  other,  with  a 
probability  of  error  in  decoding  being  50  percent.  This  result  was 
expected  because  since  the  signal  S2  is  out  of  phase  by  2St  with 
respect  to  s^  and  the  decoder  can  equalize  S2(t)  with  one  of  the 
Sl(t),  s^(  t - AT),  s^(  t - 2AT) , or  s^t  - 3 AT)  where  AT  = 4St,  the 
decoder  has  to  decide  in  an  arbitrary  way  whether  to  equalize  S2(t) 
= si(t  - 2St)  with  sj(t)  or  its  equally  likely  candidate  s^(t  - At) 
= s^( t - 4St) . 

For  the  general  case  of  an  N-channel  correlator  receiver  (Figure 
6.1,  equations  (6.13)),  define  the  following  N-order  polynomials 
P(x),  Q(x)  as  follows: 


P(x)  = A^  ^S2  + A^  3S2  • x + A^  3S2  * x2  + ...  + 


(6.15) 


Q(x)  = s^  + As^  • x + A3si  • x2  + . . . + An  2S!  • 


(6.16) 


x' 


N-l 


xN-2  + $ ls^  • 
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It  can  be  proven  that 

P(x)  • Q(x)  mod  (xN  - 1)  = Rs^S2(N  - 1)  + Rs^S2(N  - 2)  • x(6.17) 

+ Rs^S2(N  - 3)  • x2  + ...  + RsiS2(1)  ’ + RsiS2(0)  x^-1 

The  proof  of  equation  (6.17)  is  omitted  as  trivial.  Obviously,  the 
desired  cross-correlation  terms  Rs^S2(0),  •••,  RsiS2(N-l)  of 

equations  (6.13)  can  be  obtained  as  being  the  coefficients  of  the 
polynomial  product  P(x)  • Q(x)  mod  (xN  - 1). 

Since  the  coefficients  of  both  P(x)  and  Q(x)  are  vectors,  then 
such  a polynomial  product  modulo  xN  - 1 would  require  N2  inner 
products  and  N(N  - 1)  sum  of  vectors  to  be  performed  in  1 + log2N 
levels  of  vector  operations. 

On  the  other  hand,  if  an  RNS  implementation  of  the  previously 
described  multi-channel  receiver  is  considered,  where  each  RNS 
channel  is  using  a modulus  m^,  such  that  the  polynomial  x^  - 1 can 
be  factored  in  N distinct  first-order  factors  in  Zm^,  the  familiar 
Nth-order  PRNS  of  Chapter  4 (section  4.4)  can  be  activated  and 
perform  the  task  of  the  cross-correlation  in  only  N parallel  vector 
channels  requiring  N inner  products  in  one  level  of  vector 
operations  instead  of  the  previous  requirement  of  N2  inner  products 
and  N(N  - 1)  sum  of  vectors  in  1 + log2N  vector  operation  levels. 
As  a result,  extremely  high-speed,  low-complexity  communication 
correlators  can  be  designed  for  multi-channel  receivers  with  the  use 
of  the  PRNS. 

As  a trivial  example  showing  how  a correlator  receiver  can  make 
use  of  the  PRNS,  consider  the  case  of  a two-channel  (bi-phase) 
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transmission,  where  the  cross-correlation  of  two  bi-phase  signals 
(S]_,  As^)  and  (s2>  AS2)  needs  to  be  computed  in  2-257  • For  the  case 
of  example,  consider  sj  = (s^(0),  s^(St),  s^(2St),  ...,  s^(76t), 

sj(0) , s^St),  ...}  = {4,  6,  8,  6,  4,  2,  0,  2,  4,  6,  ...}  and  s2(t) 
= s^(t  + St).  Our  signals  have  period  T = 8St  and  the 
dimensionality  of  the  problem  is  N = 2 while  AT  = 8St/2  = 4St. 

The  terms  of  the  cross-correlation  will  be 

Rs^S2(0)  = si  • S2  + Asi  • AS2  (6.18) 

RsiS2(1)  = • AS2  + As^  • S2 

where  s^  = (4,  6,  8,  6),  As^  = (4,  2,  0,  2),  S2  = (6,  8,  6,  4),  and 
AS2  = (2,  0,  2,  4).  Direct  substitution  gives  Rs^CO)  = 160  mod 
257  and  Rs^S2(l)  s 96  mod  257. 

If  the  2nd-order  PRNS  needs  to  be  used  for  the  above 
computation,  the  cross-correlation  should  be  realized  as  a 
polynomial-product  statement 

P(x)  • Q(x)  mod  (x^  - 1)  = Rs^S2(1)  + Rs;[S2(0)  • x (6.19) 

where 

P(x)  = &S2  + S2  ' x (6.20) 

Q(x)  = sj  + Asi  • x 

The  second-order  PRNS  mapping,  based  on  the  factorization  of  x^ 
- 1 in  a modular  ring  Zp,  must  be  considered.  According  to  theorems 
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4.32  and  4.33  and  equations  (4.7)  and  (4.9),  the  PRNS  mapping  f2 

maps  A(x)  = ag  + a^x  onto  (ag*,  a^*),  where  ag*  = A(x)  mod  (x  - rg) 

and  a3*  = A(x)  mod  (x  - r3)  with  rg  and  r3  such  that  x2  - 1 h (x  - 

'U 

rg)  (x  - r3)  modp.  Obviously,  rg  s 1,  s -1  and  f2  is  described 

by 


ag*  — (ag  + a3)  modp,  aj_*  h (ag  - aj)  modp  (6.21) 

while  the  inverse  mapping  f2  x gives 

ag  = [ 2~'*-(ag*+a^*)  ] modp,  a3  s [2~l(ag*-ai*) ] modp  (6.22) 

Using  si  = (si(0),  si(St),  si(26t),  si(3St)}  = (4,  6,  8,  6),  As3  = 

{ As  x ( 0 ) , As1(St),  As1(2St),  ASl(36t)}  = (4,  2,  0,  2),  s2  = (s2(0), 
s2(&t),  s2(2St) , s3(36t))  = (6,  8,  6,  4),  and  As2  = (As2(0), 
As2( St) , As 2 ( 2 5 1 ) , As2(3St)}  = (2,  0,  2,  4)  and  taking  into  account 
equations  (6.20),  (6.21),  and  (6.22)  we  get:  (As2(0),  s2(0))  ^2> 

'W  % 

(As2*(0),  s2*(0))  or  (2,  6)  ^2  (8,  253),  (si(0),  Asi(0))  *2^ 

(si*(0) , Asi*(0) ) or  (4,  4)  f2  (8,  0)  and  the  product  (As2*(0)  • 

si*(0) , s2*(0)  Asi*(0) ) = (8  • 8,  253  • 0)  = (64,  0).  The  other 

three  products  give  (As2*(St)  • si*(5t),  s2*(St)  • As3*(6t))  = (64, 

225),  (As2*(2St)  • s1*(28t),  s2*(2St)  • As1*(28t))  = (64,  225)  and 

( As2*(3St)  • s3*(36t) , s2*(3St ) • As;|*(35t))  = (64,  0).  Adding  the 

products  together  the  result  is:  [(64,  0)  + (64,  225)  + (64,  225)  + 

(64,  0)]  = (256,  450)  = (-1,  193)  while  the  inverse  mapping  gives  (- 
% -1 

1,  193)  !2__>  (96,  160).  So  Rs1s2(0)  = 160  and  Rs1s2(l)  = 96  which 


is  the  correct  result. 
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Figure  6.2  shows  a block  diagram  form  of  a bi-channel  correlator 
receiver  that  makes  use  of  the  second-order  PRNS,  while  the 
generalized  case  of  an  N-channel  receiver  using  the  Nth-order  PRNS 
is  presented  in  Figure  6.3. 

6.3  Other  Multi-Dimensional  Applications  of  the  PRNS 


Another  interesting  application  of  the  PRNS  is  the  computation 
of  the  Discrete  Fourier  Transform  (DFT)  using  high-speed  convolution 
techniques.  Rader  [Rad68a]  showed  that  using  permutation 

techniques,  an  N-point  DFT  can  always  be  converted  into  circular 
convolution  when  N is  a prime  number. 

As  an  example,  consider  the  case  where  N = 5.  A 5-point 
discrete  Fourier  transform  will  take  the  form 

4 

X(k)  = E x(n)  Vnk,  k = 0,  ...,  4 (6.23) 

n=0 

where  W = e-j(2ri/5).  Equation  (6.23)  can  be  rewritten  as 

X(k)  = x(0)  + X(k) , k = 0,  ...,  4 (6.24) 

wi  th 

4 

X(k)  = E x(n)  Wkn,  k = 0,  . . . , 4 
n=l 


(6.25) 
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s2(N)  = As2(0)  , 
s2 (N— 1 ))•••} 

s2(0) 


Note:  SR  = Shift  Register,  AC  = Accumulator 


Figure  6.2  PRNS  Bi-Channel  Correlator  Receiver 
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s2 


SR 


AN_ls2(k) 


As2(k) 


s2(k) 


S1 


AN_^s^(k) 
> 


Asi(k) 

> 

si(k) 
> 


SR 


Rs1s2(N-1) 
> 

Rs1s2(N-2) 
> 


Rs1s2(1) 
> 

Rsis2(0) 
> 


Note:  SR  = Shift  Register,  AC  = Accumulator 


Figure  6.3  PRNS  Based  N-Channel  Correlator  Receiver 


180 


The  matrix  representation  of  equation  (6.25)  is 


X(l) 

w1  w2  w3  w4 

x(l) 

X(2) 

w2  w4  w1  w3 

x(2) 

X(3) 

w3  w1  w4  w2 

x(3) 

X(4) 

w4  w3  w2  w1 

x(4) 

After  simple  permutation  operations  have  been  applied  in  equation 
(6.26)  [Kol77a]  it  can  be  rewritten  as 


X(l) 

w1  w3  u4  w2 

x(l) 

X(2) 

w2  w1  w3  w 4 

x(3) 

X(4) 

W4  V2  w1  u3 

x(4) 

X(3) 

w3  w4  w2  w1 

x(2) 

Since  equation  (6.27)  represents  a circular-convolution 
statement,  it  can  be  rewritten  as  a polynomial  product  modulo  (z4  - 
1) 


X(l)  + X(2)z  + X(4)z2  + X(3)z3  = (6.28) 

(x(l)+x(3)z+x(4)z2+x(2)z3)  • (W3+W2z+W4z2+W3z3)  mod  (z4  - 1) 


If  the  operations  are  chosen  to  be  performed  in  some  modular 
ring  Zm,  for  which  z4  - 1 can  be  factored  in  four  distinct  factors 
modulo  m,  then  the  PRNS  can  provide  high-speed,  low-complexity 
implementations  of  convolvers  and  DFT  engines. 


CHAPTER  7 
CONCLUSIONS 


It  has  been  well  established  during  the  past  two  decades  that 
the  RNS  offers  the  potential  for  ultra-high-speed  data  processing 
due  to  the  carry  free  nature  of  its  arithmetic.  The  research 
conducted  here  resulted  in  advancement  of  the  body  of  available 
knowledge  about  Residue  Number  Systems  and  led  to  the  birth  of  the 
so-called  Polynomial  Residue  Number  System  of  order  N (PRNS). 

The  PRNS  was  derived  in  terms  of  polynomial  rings  and  the 
Chinese  Remainder  Theorem  and  was  based  on  the  factorization  in  Zm 
of  polynomials  of  the  form  xN  ± 1.  An  extensive  theoretical  study 
identified  all  the  possible  modular  rings  Zm  in  which  both  x^1  + 1 
and  xN  - 1 were  factored  in  N distinct  first-order  factors  modulo  m. 
It  was  proven  that  if  the  modulus  which  allows  the  factorization  of 
the  polynomial  xN  + 1 (or  xN  - 1)  in  N distinct  first  order  factors 
is  a prime  number  or  a power  of  a prime,  then  such  a factorization 
is  unique.  A composite  modulus,  which  consists  of  more  than  one 
primes,  results  in  more  than  one  polynomial  factorizations.  The 
procedure  for  finding  all  possible  factorizations  was  developed. 
Next,  a mapping  was  derived  that  transforms  a serial  arithmetic 
system,  whose  rule  of  multiplication  is  S^(x)  • S2(x)  modx^  + 1, 
into  a parallel  processing  system,  where  multiplication  is  performed 
by  multiplying  N residue  pairs  component  by  component.  The  mapping 
ffl  was  proven  to  be  an  isomorphism,  its  inverse  f^-^  was  developed 
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and  closed  formulas  for  fN  and  f^-1  were  provided  which  demonstrate 
that  the  complexity  of  both  the  forward  and  the  inverse  mapping  is 
multiplied  by  four  every  time  the  order  of  the  problem  doubles.  The 
analytical  expressions  for  f^  and  f^-^  showed  that  they  are  both 
expressed  in  terms  of  the  N roots  of  the  congruence  xN  ± 1 = 0 modm 
while  a number  of  theorems  established  some  interesting 
relationships  between  the  roots  of  x^  + 1 s 0 modm.  These 
relationships  can  lead  to  more  simplified  hardware  designs. 

The  existing  Quadratic  Residue  Number  System  (QRNS)  was  proven 
to  be  a 2nd-order  PRNS  while  its  application  in  the  problem  of 
complex  multiplication  resulted  in  a hardware  savings  of  more  than 
50  percent  and  higher  throughputs  were  obtained.  The  application  of 
the  QRNS  in  the  implementation  of  a radix-4  FFT  was  discussed  and  a 
reduction  of  the  table  look-up  count  from  34  down  to  22  per  FFT 
butterfly  was  reported  when  using  the  QRNS,  while  a reduction  in  the 
number  of  levels  of  operations  from  4 down  to  3 per  butterfly  was 
obtained.  Therefore,  faster  and  more  compact  RNS-FFTs  can  be 
realized  in  less  hardware. 

The  low  complexity  offered  by  the  QRNS  in  the  case  of  a complex 
multiplication  resulted  in  the  design  of  a QRNS  Single-Modulus  (SM) 
complex  ALU  and  both  table  look-up  and  conventional  hardware 
implementations  of  such  an  ALU  were  discussed.  For  the  case  of 
conventional  hardware  implementation,  a modulus  of  the  form  p = 2n  + 
1 was  found  to  provide  hardware  elegance  for  QRNS  based  designs.  In 
order  to  improve  the  hardware  requirements  and  increase  the  speed  of 
such  a SM  QRNS  by  reducing  the  channel  width,  the  use  of  an 
algorithm  proposed  by  Games  [Gam,  Gam85a]  was  suggested.  This 
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algorithm  decomposes  a two-channel  complex  number  into  four 
polynomial  channels,  with  half  the  wordlength  per  channel  than 
before.  It  was  found  that  such  a decomposition  algorithm  makes  use 
of  a 4th-order  SM  PRNS  feasible.  Such  SM  PRNS  ALU  implementations 
were  discussed  for  both  cases  of  table  look-ups  and  conventional 
hardware  designs  and  comparisons  were  performed  between  a 4th-order 
SM  QRNS  and  a 4th-order  SM  PRNS.  It  was  proven  that  both  the 
forward  and  inverse  4th-order  PRNS  isomorphic  mappings  f^  and 
were  able  to  be  applied  with  elegance  for  both  implementations,  due 
to  the  fact  that  the  congruence  x^  + 1 = 0 modp  presents  symmetry 
within  its  four  distinct  roots  in  Zp.  For  the  case  of  table  look-up 
implementations,  the  SM  PRNS  approach  was  shown  to  be  superior  to 
its  corresponding  SM  QRNS  resulting  in  great  memory  savings.  For 
conventional  hardware  implementations,  it  was  proven  that  a modulus 
of  the  form  p = 2n  + 1 for  the  SM  PRNS  would  be  the  best  possible 
choice  from  a hardware  complexity  standpoint.  The  SM  PRNS  was  shown 
to  outperform  the  corresponding  SM  QRNS  in  terms  of  speed, 
especially  in  the  case  of  multiplicative  intensive  environments. 
Unfortunately,  the  non-systematic  nature  of  Games'  decomposition 
algorithm,  as  well  as  the  nonavailability  at  this  time  of  an 
efficient  scaling  algorithm  for  the  SM  complex  PRNS,  make  it 
impossible  for  such  a system  to  achieve  its  full  potential  and 
future  research  should  be  conducted  to  solve  the  above  two  problems. 

In  the  case  of  real  arithmetic,  the  Nth-order  PRNS  was  also  used 
as  a beneficial  tool  for  memory  reduction  and  speed  increase  in 
single-modulus  environments.  The  problem  of  performing  the 
multiplication  (A  • B)  modm  was  considered  where  A,  B s Zm  and  m = 
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2n  + 1,  while  an  algorithm  was  provided  to  decompose  the  above 
product  in  N identically  wide  polynomial  channels,  each  having  a 
smaller  width  than  the  original  channel.  Then  an  Nth-order  SM  PRNS 
was  used  to  process  the  above  product  in  N concurrent  paths  with 
less  hardware  than  if  (A  • B)  modm  was  not  decomposed.  The 
decomposition  of  (A  • B)  mod  2n  ± 1 in  N equally  wide  polynomial 
channels  with  the  same  modulus  was  compared  to  the  decomposition  of 
the  above  product  using  a 3-moduli  RNS  with  moduli  set  P = {2^  - 1, 
2\  2^  + 1}.  It  was  proven  that  for  n > 40  and  N > 8 the  Nth-order 
SM  PRNS  was  able  to  offer  38  to  99  percent  memory  savings  versus  the 
3-moduli  RNS  in  table  look-up  designs.  In  the  above  case  (n  > 40,  N 
> 8)  higher  throughputs  for  both  table  look-up  and  non  table  look-up 
implementations  were  achieved.  Finally,  an  efficient  scaling 
technique  for  our  real  SM  PRNS  was  developed  capable  of  scaling  by  a 
scale  factor  K.  = 2^  with  minimal  use  of  the  polynomial  CRT. 

Although  the  PRNS  can  be  used  successfully  in  complex  and  real 
arithmetic,  the  full  potential  of  such  a system  can  be  seen  in 
applications  of  high  dimensionality,  because  if  the  Nth-order  PRNS 
is  applied  in  an  N-dimensional  problem,  time  and  hardware  waste  in 
encoding  and  decoding  will  be  unnecessary.  Such  a multi-dimensional 
problem  where  the  PRNS  can  be  applied  with  great  success  is  in 
designing  multi-channel  communication  correlator  receivers.  It  was 
demonstrated  that  if  the  Nth-order  PRNS  was  used  in  the  design  of  an 
N-channel  correlator,  a correlation  could  be  performed  in  N parallel 
channels  with  only  N vector  products  and  only  one  level  of  vector 
operations  would  be  required.  On  the  other  hand,  if  the  PRNS  is  not 
used,  the  requirement  would  be  considerably  higher:  N^  vector 
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products  and  N (N  - 1)  vector  adds  in  1 + log2N  levels  of  vector 
operations.  As  a result,  ultra-high-speed,  low-complexity  receivers 
can  now  be  designed  using  the  PRNS. 

While  the  design  of  a communication  correlator  is  one  of  the 
most  important  challenges,  there  are  many  other  interesting  DSP 
problems  that  can  be  encoded  as  polynomial  products  where  the  PRNS 
can  offer  significant  assistance.  Such  problems  would  be 
computations  of  convolutions,  DFTs,  or  problems  in  the  area  of 
Multidimensional  Signal  Processing.  The  future  researcher  and 
applied  scientist  will  definitely  find  it  a very  fruitful  experience 
to  work  with  PRNS  applications. 
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